DIY Image Compression Algorithms (warning: the computer science in this thread is real, image heavy)

Nuke · June 6, 2015

so a few days ago i was playing around with one of those esp8266 internet of things modules. its a nifty little gadget that lets you connect anything with uart to wifi. it got me thinking, what can i do with this fun little gadget. i played with moving strings between computers and my arduino. rummaging through my dev boards i found several camera and lcd modules i can play with and so decided to see what kind of images, possibly even video i can send over wifi. it has all kinds of applications, remote rendering to portable devices, wireless cameras for drones and robots.

there is one small catch, the maximum data rate is supposed to be around 2 megabit so i need compression. i cant just use something off the shelf or out of a library found on github. i need something that can compress fast and decode even faster. i will probibly be running it on an arm arduino. so im going to need to cook up some algorithms. time for some science!

in fact i had spent the last 3 days working on the required tools to do some reserch, and around 1:30 am i smashed the last bug and was ready to start writing some algorithms to test. i drew a lot of inspiration from the way block compression algorithms for game textures work. you turn an image into a bunch of small cells, and use indexed color to reduce the bits per pixel. dxt1 uses 2-bit color indices, so each cell has only 4 colors. however it only needs to store 2 colors with the cell, the brightest and darkest presumably, and interpolates 2 more intermediate colors. its color data is 16 bit, and so 32 bits go to pixel indices, and the other 32 go to colors, so it only takes 4 bits per pixel.

my first attempt was on a grayscale image. i know i can get better than 4bpp with gray. im using a 5x5 block, the reason is i needed an extra 6 bits. unlike dxt1 which uses floating point interpolation, my format only stores a single grayscale value. i also store a 6 bit delta value, so i can use simple addition, multiplication and bit shifts to do the arithmatic. this is a data map one of the blocks:

--in binary the image would be in the format: (1 char = 1 bit)

-- PpPpPpPp PpPpPpPp - pixels 0-7

-- PpPpPpPp PpPpPpPp - pixels 8-15

-- PpPpPpPp PpPpPpPp - pixels 16-23

-- PpDddddd Oooooooo - pixel 24, delta, offset

-- total size: 64 bits, bits per pixel: 2.56

--compression algorithm

- 1 determine the extents of the pixel values in the block, these are valueMax and valueMin

- 2 valueRange will be calculated as valueMax-valueMin

- 3 the delta will be calculated as follows: valueRange/4 (the division can be performed with a right shift by 2 for speed if neccisary, otherwise a fixed point divide will be used for better rounding)

- 4 the offset will be valueMin, it may be adjusted to compensate for any overshoot or undershoot of offset+delta*3 in relation to offset+valueRange

- 5 the color table will be build such that color indices 0-3 will equal offset+index*delta

- 6 the uncompressed pixels will be given the index of the color value in the color table that it is closest too

- 7 the data is packed into a 64-bit block as shown above

- 8 to decompress, the color table will be generated as in step 5, then each pixel will be assigned the value that corasponds with its index. decompress should be very fast.

if you have a pocket calculator, you can divide the number of bits by the number of pixels to get the bpp: 2.56 bpp! a frame @ 640x480 will only be 96k. 10 fps should come in at under a megabyte. though there is a tiny problem in that an arduino only has 96k of ram total, and that would take 8 megabit to move over the esp8266. so i will probibly have to decrease the resolution to make it usable. at 320*240 is more usable at 24k a frame. so what are we looking at in image quality:

its not quite optimal, there is a lot of underuse of the color table, only about 1/10 blocks has more than one color used. so it needs some work. but in my eyes its a succes, im on the right track. now i require sleep. tomorrow i will attempt a color format.

Nuke · June 7, 2015

i finally figured out why i was getting horrendous blobking. i noticed that it was very rare for more than one color in each block to get utilized. the cause was pretty stupid actually. since my test code is written in lua there is no easy way to store pixel data. you could fill a table with numbers but that is very bad on memory usage. so i am storing my pixel data as strings where each pixel color is stored in 3 chars (im using 24 bit, mainly because i havent added 8 bit support to my code yet, after i load the image i have a function that scans through and averages out the 3 colors, then i only bother reading the blue pixel). since i need to convert the char to a number before i can operate on it, i have a call to string.byte() to convert it over. somewhere later on in the program i inadvertantly did the conversion a second time. since lua automatically does number to string conversion as needed, it didnt toss an error that my string wasnt a string. yay lua!

i also noticed a couple other bugs in my algorithm. first of all the division by four in step 3 was wrong. because pixel zero in the color table is just an offset, i can only apply delta 3 more times so the full range wasnt utilized. i also threw in some floating point rounding in there to smooth things out. of course this breaks one of the assumptions i had made earlier whilst designing the algorithm, that say if you had pixels of value 0, and another one at 255, that it would be impossible to represent both values without a huge loss in brightness. since 3*63=189, i could loose up to 66 brightness levels in a block, which will make it stand out like a sore thumb.

there are several ways to solve this problem. i could use a global scaler constant, so that a delta value of 1 actually represents a value of 1.35 (which would permit me to use the full range), though im concerned that might result in some quality loss or excessive overshoot in low range blocks. but generally shotgun approaches like that are bad, especially for such a rarely occurring problem. i could do big changes to the format, like do 6x6 blocks and 2 8-bit values, but that would change the data rate and i really want to make a 14 bit color info work. i can selectively detect when there is not enough delta range and center the range of the color table between the minimum and maximum values. this doesnt actually solve the problem, it just applys it more fairly to both low and high value pixels and might improve visual quality, but the decompressor wont care and will just decode it like every other block. i might consider the high values of delta "magic" and generate the color table slightly differently, say offset+index*delta+(22*index), when the delta is 63 (for deltas of 60-63 i can use constants 6,11,17,22, so that intermediat ranges between 189-255 can be better represented).

also it turns out there is another problem, it is also possible to exceed the range of 255 if you have a high offset and when generating the color table it is possible to produce pixels that are higher than 255. rarely does this exceed it by very far, so a simple conditional is used to detect that. before i try to implement the changes above, i should point out that with the first bug fixed, images now look like this:

look at that improvement in quality. not bad for 2.56 bits per pixel. if you zoom in you can see a few compression artifacts, but i am quite impressed with this block format. because of the quality im going to work on a new format rather than try to perfect this one. i am going to tackle color! i ran some numbers and i think im going to do a 7x5 pixel block with 2 bit indices and 18 bits of color data, each block will be 11 bytes. i can either do two 9 bit colors or i can do one 12 bit color and a 6 bit delta, so the next tests will be comparative analasys between the two. i should also point out that the data rate will be even smaller than the previous format @ 2.5143 bits per pixel. seems a general rule of thumb is larger block sizes let you trade quality for size, so it will be nice to see what this block will look like.

Edited June 7, 2015 by Nuke

Nuke · June 7, 2015

i managed to get my color compression to work, but its not as good as i had hoped.

but it is pretty good when you consider than uncompressed 9-bit looks like this

the sun is going to come up, so i need sleep. i still want to do a 12bit format with a 6bit delta to compare. i also think i will try a 12 bit format tomorrow. im thinking a 6x6x2 color table with 2 12-bit colors, which shouldnt be as hard, just change a few numbers on the 9-bit algorithm. that one will push the bpp up to 2.67 though, and i kind of want to go in the other direction. 8x8x2+12x2 will get me down to 2.375, but i dont know what that will do to my quality. i also want to try another grayscale algorithm, 8x8x2+8x2 will net me 2.25bpp. theoretically with a 2 bit index color the lowest you can get is 2bpp, so 2.25 sounds pretty good.

Edited June 7, 2015 by Nuke

Nuke · June 8, 2015

heres my 12-bit with 6-bit delta version of the 7x5 format.

one thing you notice off the bat is that you can see a lot more color variation comes out of this format. however this comes at the cost of more blockyness overall. i havent looked at the data for the image yet (mainly because it worked the first time and i didnt have to print various data to the output to search out bugs) so its possible that there is a little under utilization of the color table. i notice the blocky blocks seem to be in the boring parts of the image, if you look at anything with a lot of color variation or a wide range of brightnesses you see a lot more colors per block. but i think it looks better than the interpolated 9-bit format, without changing the bpp. really puts the science into computer science doesnt it?

also this morining i looked at what kind of data rates i can get out of the esp8266, seems i need to limit my transmission speed to 1.8 megabit. the module supports faster data rates, but the fastest one it supports that the arduino due can utilize is only 230400, and i expect the overhead will slow things down considerably. but i can do a 315*250 @ 10 fps using one of the 7x5 color formats.

next up i am going to do an interpolated 12 bit format, but im going to really ramp up the size of a block to 8x8 to see what happens.

*some minutes later*

interpolated 12 bit 8x8 format as ordered. you can really start to see some awful artifacts now, but overall image quality is improved. those color tables are now being more fully utilized, and bpp is down to 2.375.

those green blocks in the top right kind of hint to bad color selection code. its choosing the bright green over the pale green for some reason. likely because the pixel average between the two colors is the same, a little bit of the green channel getting spread to the other two channels, for example it cant tell between 3'14'5 and 7,8,7 because they both average out to 7.333. im not sure what to do in these situations where the wrong color has the closest average. i could favor color over brightness, check the differences between each color value, and favor the color with the lease difference in the dominant color. but its too late to figure out how to do that. require("sleep").

Edited June 8, 2015 by Nuke

ZedNova · June 8, 2015

This is pretty interesting.

Nuke · June 9, 2015

before shifting gears to try and stream some images to my arduino, so i wanted to do another gray format. this is using an 8x8x2 block, with 2 bytes of color data. this is the 2.25 bpp format.

you can see some blocking along the grid lines. having an unusually bright or dark pixel will cause a much larger range of values in the color table, and this causes fewer color options for the less extreme pixels. this may be why the delta based formats dont have as severe artifacts, because they kind of have upper limits on the maximum range in a block. this may be what was going on with the color format, compounded by the presence of color information.

Camacha · June 9, 2015

Do I understand correctly you are focussing on lossy compression? Have you thought about lossless compression?

Nuke · June 9, 2015

i dont think lossless compression will work for the intended application. i want to squeeze 5-10 fps at about 320x240 through a 1.8 mbit connection. in addition to the bandwidth limitations i also have a total of 96k of ram on the mcu (an 84 mhz arm) running the algorithms. using the 8x8x2+8x2 format, a compressed frame would take up about 22k. uncompressed the same image would consume 75k (the 8x8x2+12x2 color format, when decompressed, wouldn't even fit). that memory must also be shared by network buffers, the compressor/decompressor and the camera/lcd module driver plus any other systems i want to throw on.

to receive video im going to handle a row of blocks at a time, decompress them, and write the data to the lcd before handling the next row. the decompression algorithm is fast enough so that the arduino due will have little trouble with it. an lcd is essentially write only memory so once you write a pixel it stays written until you write something else to it. so this is by far the easy job. its theoretically possible to read in the whole frame and decompress it while writing the uncompressed data directly to the lcd at the same time, though i suspect decoding a whole row would make writing to the screen somewhat faster.

to transmit video im going to have to read the uncompressed data from the camera, compress it, and store the compressed info in memory. then send the whole image down the network interface. the camera i have is a 640x480x16 cmos camera module with an spi+parallel interface. data comes out of it uncompressed, fortunately it has a half res mode so i dont have to do any scaling in software. since i cant store a whole frame (unless grayscale) im going to have to scan out as much data as possible before running a compression pass. the compression algorithm is many times slower than the decompression algorithm, so im probibly not going to get as good performance.

Edited June 9, 2015 by Nuke

*Aqua* · June 9, 2015

You should think about writing that code in Assembler or C. It should result in a good performance improvement.

Also why didn't you use GIF, JPEG or PNG? I'm sure there are some image libraries which can run on that processor.

And another idea: You can try to implement interlacing to further cut down memory usage. It should also be easy to implement.

Edited June 9, 2015 by *Aqua*

LordFerret · June 9, 2015

From what I'm looking at here, online, the Arduino Due's programming language is based on C/C++. Still, I would agree, that coding in the processor's base assembler would provide the best results ... especially in light of instruction clock cycles optimization. I also see resources for such online.

Nuke · June 9, 2015

the algorithms test was written in lua, but it was only meant to test the algorithms, to identify potential issues and evaluate image quality. it can take it a second or two to compress the image. its literally written in the least efficient way possible, and entirely on purpose. sometime today im going to write up decomressors for both the 8x8 block formats in c, and the serialization code that goes with it and i may also have to port an lcd library. actually writing it in c should actually be easier, i dont have to constantly qualify variables to make sure they are inside their range limits. i can just union the struct with a char array and have instant serialization.

i didnt really want to use an out of the box format. an 8-bit indexed color png can get down to 44k, 32k for gif. jpeg can get pretty small but its is somewhat cpu intensive.

good idea with the interlacing. i will probibly need to do that to talk to the camera so there is not a lot of delay between lines. i can stick an entire field into memory, compress it and send it down the line. i was kind of worried that there would significant delay between scans.

e:

was having a little trouble getting my lcd to work with the due, new library and all (supposedly it supports dma which is nice). but i got it scanning out a new color about 3 times a second. this might be as fast as it can go. its also kind of glitchy probibly due to the mass of long wires connecting it. i may need to go make a new wiring harness for the thing, but its probibly good enough to play with. im more interested in running a camera than an lcd, since i can use it for telepresence robotics. i got a raspberry pi with a tft display that i intend to use as the remote, leaving the due to handle the camera and motor/servo controllers.

Edited June 9, 2015 by Nuke

LordFerret · June 10, 2015

I breezed over the ARM Cortex-M3 CPU datasheet last night (whitepaper), and also found the assembler instruction set tech ref manual ... also a resource defining instruction cycles of operands. I was quite surprised by what I found.

http://www.arm.com/files/pdf/IntroToCortex-M3.pdf

http://users.ece.utexas.edu/~valvano/EE345M/CortexM3InstructionSet.pdf

You could decompile and view the C object file and see what methods of instruction the compiler is generating. Likely they'll suffice for your needs, but then again - maybe not.

Fel · June 16, 2015

i dont think lossless compression will work for the intended application. i want to squeeze 5-10 fps at about 320x240 through a 1.8 mbit connection. in addition to the bandwidth limitations i also have a total of 96k of ram on the mcu (an 84 mhz arm) running the algorithms. using the 8x8x2+8x2 format, a compressed frame would take up about 22k. uncompressed the same image would consume 75k (the 8x8x2+12x2 color format, when decompressed, wouldn't even fit). that memory must also be shared by network buffers, the compressor/decompressor and the camera/lcd module driver plus any other systems i want to throw on.
to receive video im going to handle a row of blocks at a time, decompress them, and write the data to the lcd before handling the next row. the decompression algorithm is fast enough so that the arduino due will have little trouble with it. an lcd is essentially write only memory so once you write a pixel it stays written until you write something else to it. so this is by far the easy job. its theoretically possible to read in the whole frame and decompress it while writing the uncompressed data directly to the lcd at the same time, though i suspect decoding a whole row would make writing to the screen somewhat faster.
to transmit video im going to have to read the uncompressed data from the camera, compress it, and store the compressed info in memory. then send the whole image down the network interface. the camera i have is a 640x480x16 cmos camera module with an spi+parallel interface. data comes out of it uncompressed, fortunately it has a half res mode so i dont have to do any scaling in software. since i cant store a whole frame (unless grayscale) im going to have to scan out as much data as possible before running a compression pass. the compression algorithm is many times slower than the decompression algorithm, so im probibly not going to get as good performance.

I'd have to go read up on it, but it sounds like you're reinventing the wheel... poorly.*

Image compression and video compression are VERY different, so much that you can get a major boost from switching to video compression as many formats will record the changes between images rather than be a collection of compressed images. They're also designed for line by line streaming / decompression (or that feature can be enabled, I forgot the term though)... I don't have the full idea of what you intend to accomplish and while doing jpeg-likes will net you quite a bit of compression, if it just passes right on through the micro-controller you don't really need each frame per second but what to change to get to the next frame.

*I don't really know how else to describe this. I think you have a channel between two arduinos and are trying to transmit data along it... but the data being transmitted exceeds your bandwidth. It isn't like you can't use external memory, and doing so would greatly benefit your ability to transmit compressed data across the channel.

*It isn't so much that I don't approve of figuring it out yourself, but that you're doing image compression when you should be doing some form of video compression and I think you missed the point of why you were compressing the stream.

Edited June 16, 2015 by Fel

*Aqua* · June 16, 2015

@Fel

The invention of P- and B-frames only reduces bandwidth at the expend of more than doubled memory and trippled processing usage. He now has to store at least two frames + the difference image (which will be send to the network) at a time and additionally has to compare all data of both images to detect the differences. While the memory might suffice I don't think the processor can calculate that fast enough to gain some image quality.

Maybe you remember the time of the first Pentium processors. Below a Pentium II 350 Mhz you needed a dedicated MPEG card to decode an MPEG stream fast enough to watch it. I'm pretty certain that an 84 MHz ARM processor wouldn't be able to encode a video stream with the same or even a quarter of the MPEG quality in real time. My old P2 350 had a encode/decode time ratio of about 5:1 for VGA quality. That means it needed about 5 hours to encode a 1 hour stream.

While Nuke's algorithm isn't as advanced and complex as MPEG his CPU won't be powerful enough for anything much more complex than he came up till now.

Edit:

I looked at the instruction table for that ARM CPU. It doesn't provide SIMD instructions to speed up processing. SIMD was what made video on a computer possible.

Edited June 16, 2015 by *Aqua*

K^2 · June 16, 2015

@Nuke: Block compression algorithms of the kind you are looking at are very inexpensive to decompress, which is why they are favored for graphics, but they are very expensive to compress properly. You either have to spend a lot of time computing correlations, or you end up with very poor quality. What you should be looking at is something like JPEG. It's also a block compression, but it's done in frequency domain, allowing you to reduce bit depth on high frequency data. Typical JPEG just uses standard discrete cosines transform, but there are fancier methods using wavelets. The transforms table can be pre-computed making it a very fast operation. The output also tends to be very easy to compress with Huffman as your second stage, which JPEG makes use of. Again, very fast operation if you pre-compute the trees. JPEG-like algorithm will give you better performance, better quality, and less data to transfer.

The stages of JPEG compression are as follows.

1) Convert RGB to YCbCr.

2) Reduce the resolution of chroma components. (Typically, each 2x2 becomes a single chroma pixel.)

3) Perform DCT on each block. (Typically 8x8 blocks for each channel.)

4) Multiply each component by pre-determined factor, keeping only top few bits. Typically 8 bits for lowest frequencies, and dropping down to 3-4 for highest frequencies.

5) Perform Huffman compression on the stream to get final output.

Steps 2, 4, and 5 are what gives you the actual compression, and each one's technically optional, but they really give you incredible results when combined. You can do 10x compression with little quality degradation.

Fel · June 16, 2015

@Fel
The invention of P- and B-frames only reduces bandwidth at the expend of more than doubled memory and trippled processing usage. He now has to store at least two frames + the difference image (which will be send to the network) at a time and additionally has to compare all data of both images to detect the differences. While the memory might suffice I don't think the processor can calculate that fast enough to gain some image quality.
Maybe you remember the time of the first Pentium processors. Below a Pentium II 350 Mhz you needed a dedicated MPEG card to decode an MPEG stream fast enough to watch it. I'm pretty certain that an 84 MHz ARM processor wouldn't be able to encode a video stream with the same or even a quarter of the MPEG quality in real time. My old P2 350 had a encode/decode time ratio of about 5:1 for VGA quality. That means it needed about 5 hours to encode a 1 hour stream.
While Nuke's algorithm isn't as advanced and complex as MPEG his CPU won't be powerful enough for anything much more complex than he came up till now.
Edit:
I looked at the instruction table for that ARM CPU. It doesn't provide SIMD instructions to speed up processing. SIMD was what made video on a computer possible.

MPEG is likely higher quality, but I do confess not having looked deeply into the algorithm behind it. As I said, he CAN use external memory. SIMD being what made the processing possible is, however, bunk. Old dos games had video to hell, never needed anything special but they weren't "high quality." What you're talking about makes it easier to have higher quality video with high quality compression, to calculate more effects and compression, but absolutely nothing to do with pure difference frames and hence lower levels of compression.

Pure difference frames can be calculated as fast as the image compression routine. Adding in complex equations to describe the "flow" of pixel regions helps, but is the next stage, not the first stage.

Nuke · June 17, 2015

@Nuke: Block compression algorithms of the kind you are looking at are very inexpensive to decompress, which is why they are favored for graphics, but they are very expensive to compress properly. You either have to spend a lot of time computing correlations, or you end up with very poor quality. What you should be looking at is something like JPEG. It's also a block compression, but it's done in frequency domain, allowing you to reduce bit depth on high frequency data. Typical JPEG just uses standard discrete cosines transform, but there are fancier methods using wavelets. The transforms table can be pre-computed making it a very fast operation. The output also tends to be very easy to compress with Huffman as your second stage, which JPEG makes use of. Again, very fast operation if you pre-compute the trees. JPEG-like algorithm will give you better performance, better quality, and less data to transfer.
The stages of JPEG compression are as follows.
1) Convert RGB to YCbCr.
2) Reduce the resolution of chroma components. (Typically, each 2x2 becomes a single chroma pixel.)
3) Perform DCT on each block. (Typically 8x8 blocks for each channel.)
4) Multiply each component by pre-determined factor, keeping only top few bits. Typically 8 bits for lowest frequencies, and dropping down to 3-4 for highest frequencies.
5) Perform Huffman compression on the stream to get final output.
Steps 2, 4, and 5 are what gives you the actual compression, and each one's technically optional, but they really give you incredible results when combined. You can do 10x compression with little quality degradation.

i was reading that up a couple days ago, it gave me some ideas, but at first glance it might take more cpu capability than i have. il have to do some more reserch into the format before i could do that.

it does give me a couple ideas however. so far ive only bothered with rgb color space. the camera module seems to support YUV, YCrCb, RGB color spaces in hardware so i wouldn't have to convert on the compression side. the much less cpu intensive decompress side would have the cpu time to do the color space transformations, and that would be 4 integer matrix-vector multiplies per block.

another thing jpeg does is throw out a lot of the color information, storing it at half or less res of the full image. my 8x8x2+8x2 gray format has pretty good quality, so i could use that as a Y component in an YCbCr colorspace. on that i piggyback a 4x4x2+4x2 format to store the CrCb data (i could also borrow 2 bits from the grey channel and store the color as 3 bit offset + 2 bit delta). were still looking at 2.875 bpp which is still pretty good.

HorstBaerbel · November 11, 2015

Hi Nuke.

Sorry for the necro, but the thread sounds really interesting. I'm working on a similar problem here, because of bandwith problems (stream R8G8B8 image data to an Arduino via serial/bluetooth and display it on a LED matrix with >500 LEDs and >20fps). I need to use simple algorithms, because I'm currently using an ATMega 328P (8bit, 16MHz), but might switch to the much more powerful Arduino Due if needed. So far I tried lossless (I don't really want to go lossy) compression using static huffman, but obviously the compression ratio for images is not stellar... My next approach is splitting the image into RGB color planes an doing a delta compression of color values using a variable number of bits. Atm I'm also able to hold the complete frame buffer in RAM, so I'll also try a simple delta-frame encoding of the RGB planes (only current vs. last frame), then do the color value delta encoding.

About your probem: Have you considered doing an extra entropy encoding step (adaptive/static huffman, LZO or LZSS with heatshrink) after your image encoding? That might improve your compression ratio a bit.

those green blocks in the top right kind of hint to bad color selection code. its choosing the bright green over the pale green for some reason. likely because the pixel average between the two colors is the same, a little bit of the green channel getting spread to the other two channels, for example it cant tell between 3'14'5 and 7,8,7 because they both average out to 7.333. im not sure what to do in these situations where the wrong color has the closest average. i could favor color over brightness, check the differences between each color value, and favor the color with the lease difference in the dominant color.

Why not check which block encoding would generate the smallerst error when being decoded?

Nuke · February 26, 2016

its nekro time!

so i made some progress on the electronics side of things. this was made possible by a couple of things. the first: i finally found an ili9341 library for the arduino due that works. so my lcd can now reliably output data. there was another thing but i forgot what it was.

so i started working on a decompression algoritm in c (format four) for the arduino. this turned out to be a very easy thing. it was solid so i didnt mess around with it till now, i determined that it takes less than 150 microseconds to decode a block and send it to the screen. thats 5 frames/sec territory. i thought the float math would slow it down but apparently this little arm processor has some nards. as a fun little thing i might compare it against fixed point math to see if i can make it a little faster.

next task was to build a state machine to handle serial commands and lcd operation. this hit a kind of a snag. at first i wrote this unpredicable mess. initially i wanted to minimize protocol overhead and decided i would cram data and command into a single byte and send them over serial. since i had about 6 3-bit commands, it meant that any errant peice of data could be mistaken for a command. that had to go. so i rewrote a 2 command system. the number of commands also went up slightly because of it. but i realized that only one command had to be fast, the one that sends data. configuring the state machine meant sending 2 bytes a pop. but thats usually done once. setting the insertion point would take 4 bytes, maybe i can get that down to 3 with a 3-byte command, or single byte commands to jump to pre-defined locations, like the starting corner or the beginning of a scan line. so to save some commands i made it so that the insertion point advances every time a block gets pushed through.

after some debugging it worked fine in the terminal. i used my lua test application to dump all the block data and required commands to a file. initially i tried outputting it directly from the lua script, but there was some kind of issue with the way lua passis strings containing binary data, some control characters just dont go through, and in some cases i ended up with a lot of extra data that i couldnt explain. there should be 1200 19-byte blocks and about 32 1-byte commands to push each scan line and start the process. i tried dumping it to a file and never wound up with the 22832 byte size the whole frame should have been. i looked at the lua documentation and forgot to set the write mode to binary. derp. i was still not able to scan out over serial, but i loaded up realterm and dumped the file over serial, and everyones favorite little kitten (and now gargantuan 15-pound desk hog) lizzy popped up.

much swearing at my bit of spaghetti code lua app ensued after that, but i never quite got it to work. it is still somewhat useful to convert images to the compressed format for now. ultimately i intend to write a compressor in c. but i really dont feel like digging out that gargantuan visual studio ide and ms's bloated libs. its one of those rare cases i envy linux users and their make files and their lovely gcc compiler. raspis cost a few bucks now, or as soon as production ramps up anyway. also considering trying to run the display directly on an esp8266 (though i dont think they have the gpio to run my camera module). im sure there are other dev boards, anything with fast wireless and an arm processor will be great. who knows what the future holds, for now heres some goodies.

you notice that the image is mirrored on the screen, theres probibly a setting in the tft library to fix that.

Nuke · February 27, 2016

On 11/11/2015 at 1:50 AM, HorstBaerbel said:

Hi Nuke.

Sorry for the necro, but the thread sounds really interesting. I'm working on a similar problem here, because of bandwith problems (stream R8G8B8 image data to an Arduino via serial/bluetooth and display it on a LED matrix with >500 LEDs and >20fps). I need to use simple algorithms, because I'm currently using an ATMega 328P (8bit, 16MHz), but might switch to the much more powerful Arduino Due if needed. So far I tried lossless (I don't really want to go lossy) compression using static huffman, but obviously the compression ratio for images is not stellar... My next approach is splitting the image into RGB color planes an doing a delta compression of color values using a variable number of bits. Atm I'm also able to hold the complete frame buffer in RAM, so I'll also try a simple delta-frame encoding of the RGB planes (only current vs. last frame), then do the color value delta encoding.

About your probem: Have you considered doing an extra entropy encoding step (adaptive/static huffman, LZO or LZSS with heatshrink) after your image encoding? That might improve your compression ratio a bit.

Why not check which block encoding would generate the smallerst error when being decoded?

i guess i didnt catch this one.

actually i figure more robust coding schemes can be used given the speed of some of these dev boards we have. on one hand i dont really like the idea of solving every problem with the #include statement. on the other, i can use tried and true encoding systems instead of something i cobbled together based on some knowlege i had about dxtx formats and limited information theory. im somewhat curious what can be done with run length encoding, just zipping my compressed image file to less than half its original size, 7z gets it down a few hundred more bytes. so its certainly worth investigating (it has to beat 150 microseconds/block for compression/decompression though).

i dont really need lossless since the main usage scenario is for putting cameras on rc vehicles, i dont need the video to be completely error free (if it works to a km or two vs the 200 or so feet my analog camera gets its an improvement, not to mention i wont have to re adjust the receiver every 3 seconds). even if half the blocks are garbled i can still navigate.

good idea measuring the error, much more scientific than "hey that looks better, this other one looks like crap". i still dont have a decent 8-bit codec, so much more testing on that reguard. and i need to find a library for my camera module first.

*edit*

i just did my fixed point math revision and got decompression and display down to < 60 microseconds! regardless off the beefiness of the fpu, int is still faster. unlike with 8 bit mcus, that have to break up the operation into a few, this 32 bit machine chews threw it in one shot. but im up to 13+ frames a second now

Edited February 27, 2016 by Nuke

DIY Image Compression Algorithms (warning: the computer science in this thread is real, image heavy)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation