How to squeeze a flat cat

Once in the icy winter season ... exactly a year ago, we had a nontrivial task. There is a screen on electronic ink, there is a 16 MHz processor (yes, in embedded electronics, especially ultra-low power, there are such) and there is no memory at all. Well, i.e. kilobytes 8 RAM and 256 Flash. Kilobytes, Karl. And in these sad kilobytes it is necessary to push several 800x600 images in four shades of gray. Multiplying quickly in mind 800 by 600 and 2 bits per pixel, we get 120 thousand bytes. A few does not fit. It is necessary to compress.

So the task appeared before us: “how to compress a flat cat”? Why a cat? Yes, because they tested on the seals, what else to check for black and white pictures. Not on dollar bills the same.

The first answer was: let's squeeze the cat RLE . The cat is flat ... That is, flat-colored - only four shades. There are a lot of empty places on the screen, i.e. duplicate pixels will be up to hell. Should shrink.

Should shrink - squeezed. With the only difficulty: we don't care how to compress, we compress on the PC, or even on the server. But we need to unclener sequentially, in a streaming way: the bytes were pulled out - the bytes were pulled out. We do not have a video buffer on 8 kilobytes of RAM, there is no place to store a decompressed cat.
')
Coped. The cat is compressed.

RLE compression

/****************************************************************************/ /* Common Utilities Library * (C) Componentality Oy, 2015 */ /****************************************************************************/ /* RLE compression implementation */ /****************************************************************************/ #include "rle.h" #include <memory.h> using namespace Componentality::Common; // Do 8-bit RLE encoding void Componentality::Common::RLE_Encode(unsigned char* source, size_t source_size, unsigned char* target, size_t& target_size) { target_size = 0; unsigned char previous_character = source[0]; unsigned char series_counter = 1; bool same = false; size_t i; for (i = 1; i < source_size; i++) { // If current byte is equal to previous if (source[i] == previous_character) { // If we process sequence of the same characters if (same) { if (series_counter < 127) series_counter++; else { target[target_size++] = 0x80 | series_counter; target[target_size++] = previous_character; series_counter = 1; } } else { if (series_counter > 1) { target[target_size++] = series_counter - 1; memcpy(target + target_size, source + i - series_counter, series_counter - 1); target_size += series_counter - 1; } series_counter = 2; same = true; } } else { if (same) { if (series_counter > 1) { target[target_size++] = 0x80 | series_counter; target[target_size++] = previous_character; series_counter = 1; } else series_counter += 1; same = false; } else { if (series_counter > 127) { target[target_size++] = series_counter - 1; memcpy(target + target_size, source + i - (series_counter - 1), series_counter - 1); target_size += series_counter - 1; series_counter = 1; } else series_counter++; } } previous_character = source[i]; } if (same) { target[target_size++] = 0x80 | series_counter; target[target_size++] = previous_character; } else { target[target_size++] = series_counter; memcpy(target + target_size, source + i - (series_counter), series_counter); target_size += series_counter; } } // Do buffered RLE decoding void Componentality::Common::RLE_Decode(unsigned char* source, size_t source_size, unsigned char* target, size_t& target_size) { target_size = 0; for (size_t i = 0; i < source_size;) { unsigned char size = source[i] & ~0x80; if (source[i] & 0x80) { memset(target + target_size, source[i + 1], size); i += 2; } else { memcpy(target + target_size, source + i + 1, size); i += size + 1; } target_size += size; } } // Check where two buffers are different size_t Componentality::Common::isDiff(unsigned char* left, unsigned char* right, size_t size) { for (size_t i = 0; i < size; i++) { if (left[i] != right[i]) return i; } return (size_t)-1; } // Incremental decoding initialization void Componentality::Common::RLE_InitDecoder(RLE_DECODE* handler, unsigned char* source) { handler->mDecodedPortion = 0; handler->mPtr = 0; handler->mOffset = 0; handler->mSource = source; } // Decode next byte incrementally unsigned char Componentality::Common::RLE_Fetch(RLE_DECODE* handler) { if (handler->mDecodedPortion > handler->mPtr) { handler->mPtr += 1; if (handler->mMode == 0x00) handler->mOffset += 1; return handler->mSource[handler->mOffset - 1]; } handler->mDecodedPortion = handler->mSource[handler->mOffset] & ~0x80; handler->mMode = handler->mSource[handler->mOffset] & 0x80; handler->mOffset += 2; handler->mPtr = 1; return handler->mSource[handler->mOffset - 1]; }

Good cat shrinks. On average, the hospital is 4 times. But we are greedy, we would be closer. We have little memory, very little, and we need it not only for the seals, we still need to build a peer-to-peer network, store routes and other nonsense.

Thought again. So they thought, they thought, they decided that LZ77 would also work, we just need to figure out how to decompress the data in a streaming manner without intermediate storage. Invented. It turned out like this:

Compression embedded-modification of LZ77 (scanback algorithm)

 /****************************************************************************/ /* Common Utilities Library * (C) Componentality Oy, 2014-2015 */ /****************************************************************************/ /* Scanback compression implementation */ /****************************************************************************/ /* Scanback - LZ77 for embedded systems */ /* Designed and developed by Konstantin Khait */ /****************************************************************************/ #include "scanback.h" extern "C" { #include "bitmem.h" } using namespace Componentality::Common; // Scan buffer (buf) back from position <index> - 1 for byte <wtf> from <minfind> to <maxfind> index static unsigned char _sb__findback(unsigned char* buf, unsigned long index, unsigned char wtf, unsigned char minfind, unsigned char maxfind) { unsigned char i; for (i = minfind; i < maxfind; i++) { if (buf[index - i] == wtf) return i; } return 0; } // Compare <buf1> and <buf2> for maximum length of <size> and return length of identical fragment static unsigned char _sb__match(unsigned char* buf1, unsigned char* buf2, unsigned char size) { unsigned char i; for (i = 0; i < size; i++) { if (buf1[i] != buf2[i]) break; } return i; } // Find maximum matching sequence in buffer <buf> to sequence starting from <index> // <winsize> - size of window to be scanned in bytes, <matchlen> - maximum length of matching area in bytes, <bufsize> - size of <buf> SB_PTR _sb_scanback(unsigned char* buf, unsigned long index, unsigned char winsize, unsigned char matchlen, unsigned long bufsize) { SB_PTR result = { 0, 0 }; unsigned char i; if (winsize > index) winsize = (unsigned char)index; if (matchlen > winsize) matchlen = winsize; for (i = 1; i < winsize; i++) { register unsigned char offset = _sb__findback(buf, index, buf[index], i, winsize); if (offset) { register unsigned char matchsize = (unsigned char)(matchlen < (bufsize - index) ? matchlen : bufsize - index); if (matchsize > offset) matchsize = offset; register unsigned char len = _sb__match(buf + index, buf + index - offset, matchsize); if (len > result.length) { result.offset = offset; result.length = len; } i = offset; } } return result; } // Do compression of buffer <buf> of size <size> to bitwise memory <mem>. Returns number of produced bits unsigned long Componentality::Common::SB_Compress(unsigned char* mem, unsigned char* buf, unsigned long size) { unsigned long bitcount = 0, i; SB_PTR cptr; for (i = 0; i < (1 << LENGTH_BITS); i++) mem[i] = buf[i]; bitcount += (1 << LENGTH_BITS) * 8; for (i = 1 << LENGTH_BITS; i < size;) { cptr = _sb_scanback(buf, i, 1 << WINDOW_BITS, 1 << LENGTH_BITS, size); if (!cptr.offset) { bitmem_put1(mem, bitcount++, 0); bitmem_put(mem, bitcount, buf[i], 8); bitcount += 8; i += 1; } else { bitmem_put1(mem, bitcount++, 1); bitmem_put(mem, bitcount, cptr.offset - 1, WINDOW_BITS); bitcount += WINDOW_BITS; bitmem_put(mem, bitcount, cptr.length - 1, LENGTH_BITS); bitcount += LENGTH_BITS; i += cptr.length; } } return bitcount; } // Initialize decoder context void Componentality::Common::SB_InitDecoder(SB_DECODER* decoder, unsigned char* mem) { decoder->bitindex = 0; decoder->mem = mem; decoder->total_decoded = 0; decoder->index = 0; decoder->brb = 0; } // Initialize decoder with ringbuffer void SB_InitDecoderRB(SB_DECODER* decoder, void* ringbuffer) { decoder->bitindex = 0; decoder->mem = 0; decoder->total_decoded = 0; decoder->index = 0; decoder->brb = ringbuffer; } // Unpack next byte from the packed stream unsigned char Componentality::Common::SB_Fetch(SB_DECODER* decoder) { register unsigned char result; if (decoder->index < (1 << LENGTH_BITS)) { if (!decoder->brb) result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)] = decoder->mem[decoder->index]; else result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)] = (unsigned char)bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, 8); decoder->index += 1; decoder->bitindex += 8; decoder->total_decoded += 1; return result; } if (decoder->index >= decoder->total_decoded) { bit isref = bitmem_get1(decoder->mem, decoder->bitindex++); if (!isref) { if (!decoder->brb) decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, 8); else decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = (unsigned char)bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, 8);; decoder->bitindex += 8; decoder->total_decoded += 1; } else { register SB_PTR ptr; register unsigned char i; if (!decoder->brb) ptr.offset = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, WINDOW_BITS) + 1; else ptr.offset = (unsigned char) bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, WINDOW_BITS) + 1; decoder->bitindex += WINDOW_BITS; if (!decoder->brb) ptr.length = (unsigned char)bitmem_get(decoder->mem, decoder->bitindex, LENGTH_BITS) + 1; else ptr.length = (unsigned char) bitmem_read_ringbuf((BIT_RINGBUF*)decoder->brb, LENGTH_BITS) + 1; decoder->bitindex += LENGTH_BITS; for (i = 0; i < ptr.length; i++) { register unsigned long srcptr = decoder->total_decoded - ptr.offset; decoder->decoded_buf[decoder->total_decoded % (1 << WINDOW_BITS)] = decoder->decoded_buf[srcptr % (1 << WINDOW_BITS)]; decoder->total_decoded += 1; } } } result = decoder->decoded_buf[decoder->index % (1 << WINDOW_BITS)]; decoder->index += 1; return result; }

We were particularly pleased with the footprint for decompression, approximately equal to 150 bytes (with the "window" of the algorithm at 127 bytes). Initially, in the Lempel-Ziv algorithms we were greatly confused by the need to allocate memory for the dictionary. RLE dictionary is completely unnecessary ... But 150 bytes will not frighten us.

Another thing scared us - despite the fact that it is known from theory that LZ77 is a generalization of RLE, replacing the second with the first gave an improvement in the result on the verge of statistical error: sometimes better, sometimes worse, but generally the same degree of compression that you don’t set .

We began to think about the entropy methods: Huffman, arithmetic coding, wrote a couple of prototypes ... did not come up. All decompressors need tables that are quite decent, but by our standards - downright indecent size.

And then ... And then we launched the scanback compression over RLE. And, oh, a miracle, the compression ratio from 3-4 jumped to 7-10, depending on the “fluffiness of the cat,” that is, the degree of flat color of the picture and the number of gradient areas. You can live. And, most importantly, RLE + SB is uncompressed in an excellent way with a stream decompressor in one pass.

Like this:

Stream decompressor RLE + Scanback

 /****************************************************************************/ /* Common Utilities Library * (C) Componentality Oy, 2015 */ /****************************************************************************/ /* Combined RLE + Scanback implementation (compression is to be done */ /* sequentially, decompression is optimized */ /****************************************************************************/ #include "rlesbc.h" using namespace Componentality::Common; // Decode next byte incrementally for stream compressed by both RLE and Scanback unsigned char Componentality::Common::RLESB_Fetch(RLE_DECODE* handler, SB_DECODER* sb_handler, unsigned char* repeating_value) { if (handler->mDecodedPortion > handler->mPtr) { handler->mPtr += 1; if (handler->mMode == 0x00) *repeating_value = SB_Fetch(sb_handler); return *repeating_value; } *repeating_value = SB_Fetch(sb_handler); handler->mDecodedPortion = *repeating_value & ~0x80; handler->mMode = *repeating_value & 0x80; *repeating_value = SB_Fetch(sb_handler); handler->mPtr = 1; return *repeating_value; }

For the past year, our seals have lived beautifully almost “in complete unconsciousness”, in spite of every ZLIB. Which, of course, press much more densely, but resources consume incomparably more. Meanwhile, we found that our RLE + SB is great for many other tasks, for example, raster fonts are perfectly compressed, even with transparency and anti-aliasing, as well as any network text traffic. Naturally, the degree of compression is a ridiculous 2.5-6, but almost no resources are consumed, especially for decompression, which, as a rule, is performed more often, and to the speed-memory is much more critical.

So we decided not to get stuck and put the code in open access (subject to the MIT license ). Suddenly, you will have to unclench something in conditions of a catastrophic shortage of memory and processor resources.

Source: https://habr.com/ru/post/273047/

All Articles

How to squeeze a flat cat

More articles: