Search Unity

LZMA compression and decompression for Unity

Discussion in 'Scripting' started by Agent_007, Apr 8, 2014.

  1. Agent_007

    Agent_007

    Joined:
    Dec 18, 2011
    Posts:
    899
    Last time when I needed general byte array compression I used LZF, but now I needed something heavier. So this time I went for LZMA. I took code from latest LZMA SDK and it worked fine in Unity. I just made simple static class (LZMAtools) that can be used to call out right parts of the SDK.

    LZMA SDK is placed in the public domain and same applies to the static class I made. I did remove few files (CommandLineParser.cs, LzmaAlone.cs and LzmaBench.cs) from the SDK since those aren't needed in general Unity usage but otherwise all files are same.

    Code downloads:
    https://bitbucket.org/Agent_007/lzma-unity (Bitbucket)
    View attachment lzma_v101.unitypackage

    And test code
    Code (csharp):
    1. void Start ()
    2.     {
    3.         // Convert 10000 character string to byte array.
    4.         byte[] text1 = Encoding.ASCII.GetBytes(new string ('X', 10000));
    5.         byte[] compressed = LZMAtools.CompressByteArrayToLZMAByteArray(text1);
    6.         byte[] text2 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed);
    7.  
    8.         string longstring = "defined input is deluciously delicious.14 And here and Nora called The reversal from ground from here and executed with touch the country road, Nora made of, reliance on, can’t publish the goals of grandeur, said to his book and encouraging an envelope, and enable entry into the chryssial shimmering of hers, so God of information in her hands Spiros sits down the sign of winter? —It’s kind of Spice Christ. It is one hundred birds circle above the text: They did we said. 69 percent dead. Sissy Cogan’s shadow. —Are you x then sings.) I’m 96 percent dead humanoid figure,";
    9.         byte[] text3 = Encoding.ASCII.GetBytes(longstring);
    10.         byte[] compressed2 = LZMAtools.CompressByteArrayToLZMAByteArray(text3);
    11.         byte[] text4 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed2);
    12.  
    13.         Debug.Log ("text1 size: " + text1.Length);
    14.         Debug.Log ("compressed size:" + compressed.Length);
    15.         Debug.Log ("text2 size: " + text2.Length);
    16.         Debug.Log ("are equal: " + ByteArraysEqual (text1, text2));
    17.  
    18.  
    19.         Debug.Log ("text3 size: " + text3.Length);
    20.         Debug.Log ("compressed2 size:" + compressed2.Length);
    21.         Debug.Log ("text4 size: " + text4.Length);
    22.         Debug.Log ("are equal: " + ByteArraysEqual (text3, text4));
    23.     }
    24.  
    25.     public bool ByteArraysEqual (byte[] b1, byte[] b2)
    26.     {
    27.         if (b1 == b2)
    28.             return true;
    29.         if (b1 == null || b2 == null)
    30.             return false;
    31.         if (b1.Length != b2.Length)
    32.             return false;
    33.         for (int i=0; i < b1.Length; i++)
    34.         {
    35.             if (b1[i] != b2[i])
    36.                 return false;
    37.         }
    38.  
    39.         return true;
    40.     }
    Output is:
    text1 size: 10000
    compressed size:57
    text2 size: 10000
    are equal: True
    text3 size: 574
    compressed2 size:414
    text4 size: 574
    are equal: True


    LZMA does compress better than LZF, and it does very efficient XML compression (520 KB -> 12 KB).

    This code DOES NOT decompress .7z files! That would require additional code. Also logic is for single file -> single file.

    A file created with LZMA encode can be decompressed with 7-zip, Keka etc, but original filenames are lost since the compressed file doesn't contain any additional metadata. e.g. if you create myfile.lzma from importantant.txt with CompressFileToLZMAFile, and you extract the myfile.lzma with Keka, you get myfile

    EDIT:

    Memory usage goes up if you use large dictionary size. Below is an image that shows memory usage in certain scenarios
    lzma_memory_usage.png
    Basically decompression takes memory ~dictionary size + 55KB and compression ~11.65 * dictionary size.

    Default dictionary size is 4MB (which isn't good for mobile devices if you are doing compression since memory usage goes to 46MB), but you can choose different dictionary size with function calls that have LZMADictionarySize dictSize as last parameter, e.g
    Code (csharp):
    1. LZMAtools.CompressByteArrayToLZMAFile(lenaImage.bytes, "output.lzma", LZMAtools.LZMADictionarySize.Dict1MiB);
    You can also use custom dictionary sizes by manually setting them (create new enum with chosen size in bytes as value). One shouldn't choose dictionary size that is larger than the size of the input file / byte array since it doesn't increase compression effiency.
     
    Last edited: Mar 16, 2015
  2. youkedamu

    youkedamu

    Joined:
    Mar 24, 2014
    Posts:
    3
    iOS LZMA cost much memory,why?
    LZMA在iOS平台上压缩占用内存好大,有办法解决吗?
     
  3. tomph

    tomph

    Joined:
    Nov 6, 2013
    Posts:
    33
    Hi there. Sorry to bump this thread again.

    Can anyone help me with retrieving this (compressed) data back from mysql? I'm storing it fine, but get stuck when it comes to returning the data back to Unity.

    My data is stored in BLOB format on MySQL DB. Can I just echo that data back as a string and decompress in unity? I've been trying this but get "OutOfMemoryException: Out of memory" errors.

    Do I have to handle this data differently when it comes back down via www.text ?
     
  4. Marble

    Marble

    Joined:
    Aug 29, 2005
    Posts:
    1,268
    Thanks for the helper script and for the tip, Agent_007. This was easy to integrate and made a huge difference when sending bytes over the net. Very pleased!
     
  5. Agent_007

    Agent_007

    Joined:
    Dec 18, 2011
    Posts:
    899
    Sorry about late answers. It seems this thread wasn't on my follow list.
    The memory usage should be about same on all platform. But the default dictionary size for LZMA in C# version is const int kDefaultDictionaryLogSize = 22; which means 2^22 bytes (4 megabytes). What dictionary size means for memory usage
    for compression: (dictSize * 11.5 + 6 MB) + state_size
    for decompression: dictSize + state_size
    state_size = (4 + (1.5 << (lc + lp))) KB by default (lc=3, lp=0), state_size = 16 KB.

    http://sourceforge.net/p/sevenzip/discussion/45797/thread/524d4695/

    so you might want to decrease the dictionary size if you are doing encoding with devices that don't have that much RAM. I will add dictionary size option to next version which should arrive during next week. And I will also create table about memory usage.

    If you want to handle binary data via Unity's www class, then use www.bytes, But if you must use www.text for some (bizarre) reason then do base64 encoding for data in server and base64 decoding in client
    http://stackoverflow.com/questions/11743160/how-do-i-encode-and-decode-a-base64-string