| |
| LZMA Utils FAQ |
| -------------- |
| |
| Copyright (C) 2007 Lasse Collin |
| |
| Copying and distribution of this file, with or without modification, |
| are permitted in any medium without royalty provided the copyright |
| notice and this notice are preserved. |
| |
| |
| Q: What are LZMA, LZMA Utils, lzma, .lzma, liblzma, LZMA SDK, LZMA_Alone, |
| 7-Zip and p7zip? |
| |
| A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. LZMA is the name |
| of the compression algorithm designed by Igor Pavlov. He is the author |
| of 7-Zip, which is a great LGPL'd compression tool for Microsoft |
| Windows operating systems. In addition to 7-Zip itself, also LZMA SDK |
| is available on the website of 7-Zip. LZMA SDK contains LZMA |
| implementations in C++, Java and C#. The C++ version is the original |
| implementation which is used also in 7-Zip itself. |
| |
| Excluding the unrar plugin, 7-Zip is free software (free as in |
| freedom). Thanks to this, it was possible to port it to POSIX |
| platforms. The port was done and is maintained by myspace (TODO: |
| myspace's real name?). p7zip is a port of 7-Zip's command line version; |
| p7zip doesn't include the 7-Zip's GUI. |
| |
| In POSIX world, users are used to gzip and bzip2 command line tools. |
| Developers know APIs of zlib and libbzip2. LZMA Utils try to ease |
| adoption of LZMA on free operating systems by providing a compression |
| library and a set of command line tools. The library is called liblzma. |
| It provides a zlib-like API making it easy to adapt LZMA compression in |
| existing applications. The main command line tool is known as lzma, |
| whose command line syntax is very similar to that of gzip and bzip2. |
| |
| The original command line tool from LZMA SDK (lzma.exe) was found from |
| a directory called LZMA_Alone in the LZMA SDK. It used a simple header |
| format in .lzma files. This format was also used by LZMA Utils up to |
| and including 4.32.x. In LZMA Utils documentation, LZMA_Alone refers |
| to both the file format and the command line tool from LZMA SDK. |
| |
| Because of various limitations of the LZMA_Alone file format, a new |
| file format was developed. Extending some existing format such as .gz |
| used by gzip was considered, but these formats were found to be too |
| limited. The filename suffix for the new .lzma format is `.lzma'. The |
| same suffix is also used for files in the LZMA_Alone format. To make |
| the transition to the new format as transparent as possible, LZMA Utils |
| support both the new and old formats transparently. |
| |
| 7-Zip and LZMA SDK: <http://7-zip.org/> |
| p7zip: <http://p7zip.sourceforge.net/> |
| LZMA Utils: <http://tukaani.org/lzma/> |
| |
| |
| Q: What LZMA implementations there are available? |
| |
| A: LZMA SDK contains implementations in C++, Java and C#. The C++ version |
| is the original implementation which is part of 7-Zip. LZMA SDK |
| contains also a small LZMA decoder in C. |
| |
| A port of LZMA SDK to Pascal was made by Alan Birtles |
| <http://www.birtles.org.uk/programming/>. It should work with |
| multiple Pascal programming language implementations. |
| |
| LZMA Utils includes liblzma, which is directly based on LZMA SDK. |
| liblzma is written in C (C99, not C89). In contrast to C++ callback |
| API used by LZMA SDK, liblzma uses zlib-like stateful C API. I do not |
| want to comment whether both/former/latter/neither API(s) are good or |
| bad. The only reason to implement a zlib-like API was, that many |
| developers are already familiar with zlib, and very many applications |
| already use zlib. Having a similar API makes it easier to include LZMA |
| support in existing applications. |
| |
| See also <http://en.wikipedia.org/wiki/LZMA#External_links>. |
| |
| |
| Q: Which file formats are supported by LZMA Utils? |
| |
| A: Even when the raw LZMA stream is always the same, it can be wrapped |
| in different container formats. The preferred format is the new .lzma |
| format. It has magic bytes (the first six bytes: 0xFF 'L' 'Z' 'M' |
| 'A' 0x00). The format supports chaining up to seven filters filters, |
| splitting data to multiple blocks for easier multi-threading and rough |
| random-access reading. The file integrity is verified using CRC32, |
| CRC64, or SHA256, and by verifying the uncompressed size of the file. |
| |
| LZMA SDK includes a tool called LZMA_Alone. It supports uses a |
| primitive header which includes only the mandatory stream information |
| required by the LZMA decoder. This format can be both read and |
| written by liblzma and the command line tool (use --format=alone to |
| create such files). |
| |
| .7z is the native archive format used by 7-Zip. This format is not |
| supported by liblzma, and probably will never be supported. You |
| should use e.g. p7zip to extract .7z files. |
| |
| It is possible to implement custom file formats by using raw filter |
| mode in liblzma. In this mode the application needs to store the filter |
| properties and provide them to liblzma before starting to uncompress |
| the data. |
| |
| |
| Q: How can I identify files containing LZMA compressed data? |
| |
| A: The preferred filename suffix for .lzma files is `.lzma'. `.tar.lzma' |
| may be abbreviated to `.tlz'. The same suffixes are used for files in |
| LZMA_Alone format. In practice this should be no problem since tools |
| included in LZMA Utils support both formats transparently. |
| |
| Checking the magic bytes is easy way to detect files in the new .lzma |
| format (the first six bytes: 0xFF 'L' 'Z' 'M' 'A' 0x00). The "file" |
| command version FIXME contains magic strings for this format. |
| |
| The old LZMA_Alone format has no magic bytes. Its header cannot contain |
| arbitrary bytes, thus it is possible to make a guess. Unfortunately the |
| guessing is usually too hard to be reliable, so don't try it unless you |
| are desperate. |
| |
| |
| Q: Does the lzma command line tool support sparse files? |
| |
| A: Sparse files can (of course) be compressed like normal files, but |
| uncompression will not restore sparseness of the file. Use an archiver |
| tool to take care of sparseness before compressing the data with lzma. |
| |
| The reason for this is that archiver tools handle files, while |
| compression tools handle streams or buffers. Being a sparse file is |
| a property of the file on the disk, not a property of the stream or |
| buffer. |
| |
| |
| Q: Can I recover parts of a broken LZMA file (e.g. corrupted CD-R)? |
| |
| A: With LZMA_Alone and single-block .lzma files, you can uncompress the |
| file until you hit the first broken byte. The data after the broken |
| position is lost. LZMA relies on the uncompression history, and if |
| bytes are missing in the middle of the file, it is impossible to |
| reliably continue after the broken section. |
| |
| With multi-block .lzma files it may be possible to locale the next |
| block in the file and continue decoding there. A limited recovery |
| tool for this kind of situations is planned. |
| |
| |
| Q: Is LZMA patented? |
| |
| A: No, the authors are not aware of any patents that could affect LZMA. |
| However, due to nature of software patents, the authors cannot |
| guarantee, that LZMA isn't affected by any third party patent. |
| |
| |
| Q: Where can I find documentation about how LZMA works as an algorithm? |
| |
| A: Read the source code, Luke. There is no documentation about LZMA |
| internals. It is possible that Igor Pavlov is the only person on |
| the Earth that completely knows and understands the algorithm. |
| |
| You could begin by downloading LZMA SDK, and start reading from |
| the LZMA decoder to get some idea about the bitstream format. |
| Before you begin, you should know the basics of LZ77 and |
| range coding algorithms. LZMA is based on LZ77, but LZMA is |
| *a lot* more complex. Range coding is used to compress the |
| final bitstream like Huffman coding is used in Deflate. |
| |
| |
| Q: What are filters? |
| |
| A: In context of .lzma files, a filter means an implementation of a |
| compression algorithm. The primary filter is LZMA, which is why |
| the names of the tools contain the letters LZMA. |
| |
| liblzma and the new .lzma format support also other filters than LZMA. |
| There are different types of filters, which are suitable for different |
| types of data. Thus, to select the optimal filter and settings, the |
| type of the input data being compressed needs to be known. |
| |
| Some filters are most useful when combined with another filter like |
| LZMA. These filters increase redundancy in the data, without changing |
| the size of the data, by taking advantage of properties specific to |
| the data being compressed. |
| |
| So far, all the filters are always reversible. That is, no matter what |
| data you pass to a filter encoder, it can be always defiltered back to |
| the original form. Because of this, it is safe to compress for example |
| a software package that contains other file types than executables |
| using a filter specific to the architechture of the package being |
| compressed. |
| |
| The old LZMA_Alone format supports only the LZMA filter. |
| |
| |
| Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma? |
| |
| A: BCJ filter is called "x86" in liblzma. BCJ2 is not included, |
| because it requires using more than one encoded output stream. |
| |
| |
| Q: Can I use LZMA in proprietary, non-free applications? |
| |
| A: liblzma is under the GNU LGPL version 2.1 or (at your opinion) any |
| later version. To summarise (*NOTE* This summary is not legally |
| binding, that is, it doesn't give you any extra permissions compared |
| to the LGPL. Read the GNU LGPL carefully for the exact license |
| conditions.): |
| * All the changes made into the library itself must be published |
| under the same license. |
| * End users must be able to replace the used liblzma. Easiest way |
| to assure this is to link dynamically against liblzma so users |
| can replace the shared library file if they want. |
| * You must make it clear to your users, that your application uses |
| liblzma, and that liblzma is free software under the GNU LGPL. |
| A copy of GNU LGPL must be included. |
| |
| LZMA SDK contains a special exception which allows linking *unmodified* |
| code statically with a non-free application. This exception does *not* |
| apply to liblzma. |
| |
| As an alternative, you can support the development of LZMA and 7-Zip |
| by buying a proprietary license from Igor Pavlov. See homepage of |
| LZMA SDK <http://7-zip.org/sdk.html> for more information. Note that |
| having a proprietary license from Igor Pavlov doesn't allow you to use |
| liblzma in a way that contradicts with the GNU LGPL, because liblzma |
| contains code that is not copyrighted by Igor Pavlov. Please contact |
| both Lasse Collin and Igor Pavlov if the license conditions of liblzma |
| are not suitable for you. |
| |
| |
| Q: I would like to help. What can I do? |
| |
| A: See the TODO file. Please contact Lasse Collin before starting to do |
| anything, because it is possible that someone else is already working |
| on the same thing. |
| |
| |
| Q: How can I contact the authors? |
| |
| A: Lasse Collin is the maintainer of LZMA Utils. You can contact him |
| either via IRC (Larhzu on #tukaani at Freenode or IRCnet). Email |
| should work too, <lasse.collin@tukaani.org>. |
| |
| Igor Pavlov is the father of LZMA. He is the author of 7-Zip |
| and LZMA SDK. <http://7-zip.org/> |
| |
| NOTE: Please don't bother Igor Pavlov with questions specific |
| to LZMA Utils. |
| |