📜 ⬆️ ⬇️

LZW-string compression in javascript and unpacking with PHP

Just yesterday, I ran into a situation that I could not find the working classes / modules for compressing / decompressing lines with the LZW algorithm . More precisely: jsCompress-jsDecompress - works. PhpCompress-PhpDecompress - works. But jsCompress-PhpDecompress either returns something completely unknown, or an empty string. Honestly, I don’t know, maybe there’s no such problem with ANSI, but with utf-8 it’s very obvious. Having spent several hours for a solution of a problem I decided to publish functions, ready to work, on a habr.
I will not explain how LZW compression works, since it is beautifully described in the wiki .

The basis was taken ready-made functions and classes: for PHP at code.google.com/p/php-lzw/ and for JS gist.github.com/843889

JS-function is left "as is", unchanged
function lzw_encode(s) { var dict = {}; var data = (s + "").split(""); var out = []; var currChar; var phrase = data[0]; var code = 256; for (var i=1; i<data.length; i++) { currChar=data[i]; if (dict[phrase + currChar] != null) { phrase += currChar; } else { out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0)); dict[phrase + currChar] = code; code++; phrase=currChar; } } out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0)); for (var i=0; i<out.length; i++) { out[i] = String.fromCharCode(out[i]); } return out.join(""); } 

')
But the PHP function had to be slightly corrected, because strings compressed by the LZW algorithm can contain character codes greater than 255 (ala unicode), and copy- add to add one function mb_ord, which will return the code for this very resulting multibyte character.
 function mb_ord($string) { if (extension_loaded('mbstring') === true) { mb_language('Neutral'); mb_internal_encoding('UTF-8'); mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII')); $result = unpack('N', mb_convert_encoding($string, 'UCS-4BE', 'UTF-8')); if (is_array($result) === true) return $result[1]; } return ord($string); } function lzw_decompress($binary) { $dictionary_count = 256; $bits = 8; $codes = array(); $rest = 0; $rest_length = 0; mb_internal_encoding("UTF-8"); for ($i = 0; $i < mb_strlen($binary); $i++ ) {$codes[] = mb_ord(mb_substr($binary, $i, 1)); } // decompression $dictionary = range("\0", "\xFF"); $return = ""; foreach ($codes as $i => $code) { $element = $dictionary[$code]; if (!isset($element)) $element = $word . $word[0]; $return .= $element; if ($i) $dictionary[] = $word . $element[0]; $word = $element; } return $return; } 


Of course, in order to correctly transfer a string, a compressed LZW needs to be encoded in base64 before transmission and decoded before unpacking. Problems with this should not be. On the PHP side, everything is smooth, and for JS , the same algorithm is everywhere in the i-net .

There is nothing super-new in these functions, but perhaps this article will save a lot of time for someone else. For what may need data compression on the client side, I wrote in the comments.

Source: https://habr.com/ru/post/152683/


All Articles