| |
| .lzma Test Files |
| ---------------- |
| |
| 0. Introduction |
| |
| This directory contains bunch of files to test handling of .lzma files |
| in .lzma decoder implementations. Many of the files have been created |
| by hand with a hex editor, thus there is no better "source code" than |
| the files themselves. All the test files (*.lzma) and this README have |
| been put into the public domain. |
| |
| |
| 1. File Types |
| |
| Good files (good-*.lzma) must decode successfully without requiring |
| a lot of CPU time or RAM. If the decoder supports only Single-Block |
| Streams, then good-multi-*.lzma won't decode, of course. |
| |
| Bad files (bad-*.lzma) must cause the decoder to give an error. Like |
| with the good files, these files must not require a lot of CPU time |
| or RAM before they get detected to be broken. |
| |
| Malicious files (malicious-*.lzma) are good in terms of the file format |
| specification, but try to trigger excessive CPU, RAM or disk usage in |
| the decoder. To prevent malicious files from putting the decoder in |
| inifinite loop (*), eating all available RAM or disk space, decoders |
| should have internal limitters that catch these situations. |
| |
| (*) Strictly speaking not infinite, but if decoding of a small file |
| would take a few weeks or even years, it's an infinite loop in |
| practice. |
| |
| |
| 2. Descriptions of Individual Files |
| |
| 2.1. Good Files |
| |
| good-single-none.lzma uses implicit Copy filter with known Uncompressed |
| Size. |
| |
| good-single-none-pad.lzma is good-single-none.lzma with Footer Padding. |
| |
| good-cat-single-none-pad.lzma is two good-single-none-pad.lzma files |
| concatenated as is. Fully decoding this file requires that the decoder |
| supports decoding concatenated files. |
| |
| good-single-subblock_implicit.lzma uses implicit Subblock filter. |
| |
| good-single-lzma.lzma is LZMA compressed file with EOPM. |
| |
| good-single-subblock-lzma.lzma has basic combination of Subblock and |
| LZMA filters. |
| |
| good-single-none-empty_1.lzma is an empty file with implicit Copy |
| filter and no integrity Check. |
| |
| good-single-none-empty_2.lzma is an empty file with implicit Copy |
| filter and CRC32 as Check. |
| |
| good-single-none-empty_3.lzma is an empty file with implicit Copy |
| filter, known Compressed Size, and no integrity Check. |
| |
| good-single-lzma-empty.lzma is an empty file with LZMA filter and no |
| integrity Check. |
| |
| good-single-subblock_rle.lzma takes advantage of Subblock filter's |
| run-length encoding. |
| |
| good-single-delta-lzma.tiff.lzma is an image file that compresses |
| better with Delta+LZMA than with plain LZMA. |
| |
| good-single-x86-lzma.lzma uses the x86 filter (BCJ) and LZMA. The |
| uncompressed file is compress_prepared_bcj_x86 found from the tests |
| directory. |
| |
| good-single-sparc-lzma.lzma uses the SPARC filter and LZMA. The |
| uncompressed file is compress_prepared_bcj_sparc found from the tests |
| directory. |
| |
| good-single-lzma-flush_1.lzma has a flush marker in the middle of |
| the file, and no EOPM. |
| |
| good-single-lzma-flush_2.lzma has a flush marker in the middle of |
| the file and just before EOPM. |
| |
| good-multi-none-1.lzma is a basic Multi-Block Stream with two Data |
| Blocks and Footer Metadata Block. |
| |
| good-multi-none-2.lzma is good-multi-none-1.lzma with Total Size and |
| Uncompressed Size added to the Footer Metadata Block. |
| |
| good-multi-none-extra_1.lzma has the `Extra is present' flag set but |
| no actual Extra Records. |
| |
| good-multi-none-extra_2.lzma has two non-empty Extra Records. |
| |
| good-multi-none-extra_3.lzma has an Extra Record that has empty Data. |
| |
| good-multi-none-header_1.lzma has very minimal Header Metadata Block |
| with only the Metadata Flags field. |
| |
| good-multi-none-header_2.lzma has all information in both Header and |
| Footer Metadata Blocks. The Size of Header Metadata Block has wrong |
| value in Header Metadata Block, but this value must be ignored by |
| the decoder in case of Header Metadata Block. |
| |
| good-multi-none-header_3.lzma has Index only in the Header Metadata |
| Block. Footer Metadata Block contains only Size of Header Metadata |
| Block and Total Size. |
| |
| good-multi-none-block_1.lzma has Index in Header Metadata Block. The |
| Compressed Size and Uncompressed Size fields are present in the Data |
| Blocks. There is some Footer Padding between the Blocks. |
| |
| good-multi-none-block_2.lzma has Index in Header Metadata Block. The |
| Uncompressed Size field is present in Data Blocks and no EOPM is used. |
| |
| |
| 2.2. Bad Files |
| |
| bad-single-none-truncated.lzma is good-single-none.lzma without the |
| last byte of the file. |
| |
| bad-cat-single-none-pad_garbage_1.lzma is good-cat-single-none-pad.lzma |
| with 0xFE appended to the end of the file. 0xFE doesn't begin .lzma |
| or LZMA_Alone format file. |
| |
| bad-cat-single-none-pad_garbage_2.lzma is good-cat-single-none-pad.lzma |
| with 0xFF appended to the end of the file. 0xFF begins .lzma format |
| file, thus the decoder has to detect that the file is incomplete. |
| |
| bad-cat-single-none-pad_garbage_3.lzma is good-cat-single-none-pad.lzma |
| with 0x5D appended to the end of the file. 0x5D is the most common |
| first byte of LZMA_Alone format file. |
| |
| bad-single-none-footer_filter_flags.lzma has different Stream Flags |
| in Stream Footer than in Stream Header. |
| |
| bad-single-none-too_long_vli.lzma has 10-byte variable-length integer. |
| |
| bad-single-none-empty.lzma is like good-single-none-empty_3.lzma but |
| with non-zero value in the Compressed Size field. |
| |
| bad-single-data_after_eopm_1.lzma has LZMA+Subblock, where the Subblock |
| filter gives one byte of data to LZMA after LZMA has detected EOPM. |
| |
| bad-single-data_after_eopm_2.lzma is like |
| bad-single-data_after_eopm_1.lzma but Subblock gives 256 MiB of data |
| to LZMA after LZMA has detected EOPM. |
| |
| bad-single-subblock_subblock.lzma has Subblock+Subblock, where the |
| Subblock decoder is given End of Input in the middle of a Subblock. |
| |
| bad-single-subblock-padding_loop.lzma contains huge amount of |
| consecutive Padding bytes, which isn't allowed by the Subblock filter |
| format. If it were allowed, this file would hang the decoder for very |
| long time (weeks to years). |
| |
| bad-single-subblock1023-slow.lzma is similar to |
| malicious-single-subblock31-slow.lzma except that this uses 1023 bytes |
| of Padding in every place instead of 31 bytes. The Subblock filter |
| format specification allows only 31-byte Padings, thus this file must |
| get detected as bad without producing any output. Allowing larger |
| Padding than 31 bytes was considered (so this test file was created), |
| but it seemed to be a bad idea since it would increase worst-case CPU |
| usage. |
| |
| bad-single-lzma-flush_beginning.lzma has flush marker in the beginning |
| of the LZMA data. |
| |
| bad-single-lzma-flush_twice.lzma has two flush markers with no data |
| between them. |
| |
| bad-multi-none-1.lzma has data after the last field in the Metadata |
| Block and the `Extra is present' flag is not set. |
| |
| bad-multi-none-2.lzma has wrong Total Size in Footer Metadata Block. |
| |
| bad-multi-none-3.lzma has wrong Uncompressed Size in Footer Metadata |
| Block. |
| |
| bad-multi-none-index_1.lzma has wrong value in the Number of Data |
| Blocks field. |
| |
| bad-multi-none-index_2.lzma has too short Metadata to contain all |
| the Index Records. |
| |
| bad-multi-none-index_3.lzma has wrong value in Total Size field in |
| the Index. |
| |
| bad-multi-none-index_4.lzma has wrong value in Uncompressed Size field |
| in the Index. |
| |
| bad-multi-none-extra_1.lzma has incomplete Extra Record at the end of |
| the Metadata Block. |
| |
| bad-multi-none-extra_2.lzma has incomplete variable-length integer as |
| Extra Record ID. |
| |
| bad-multi-none-extra_3.lzma has incomplete Extra Record at the end of |
| the Metadata Block. |
| |
| bad-multi-none-header_1.lzma has empty Header Metadata Block (even |
| the Metadata Flags field is not present). |
| |
| bad-multi-none-header_2.lzma has Index in the Header Metadata Block, |
| which describes only one Data Block, while the Stream actually has |
| two Data Blocks. A sophisticated decoder should give an error when |
| it detects the second Data Block; all Multi-Block decoders must |
| detect the file as corrupt at some point. |
| |
| bad-multi-none-header_3.lzma contains too small Total Size in Header |
| Metadata Block. A sophisticated decoder should abort decoding before |
| the second Data Block, preferably before the first Data Block has |
| been finished; all Multi-Block decoders must detect the file as |
| corrupt at some point. |
| |
| bad-multi-none-header_4.lzma is like bad-multi-none-header_3.lzma but |
| with too small Uncompressed Size. |
| |
| bad-multi-none-header_5.lzma has Index in the Header Metadata Block, |
| but the Total Size field is missing from the Footer Metadata Block. |
| |
| bad-multi-none-header_6.lzma has both Index and Total Size in Header |
| Metadata Block, but Total Size doesn't match the Index. A sophisticated |
| decoder should abort before decoding any Data Blocks; all Multi-Block |
| decoders must detect the file as corrupt at some point. |
| |
| bad-multi-none-header_7.lzma has zero as the Size of Header Metadata |
| Block in the Header Metadata Block. |
| |
| bad-multi-none-block_1.lzma has wrong Uncompressed Size in the first |
| Data Block. A sophisticated decoder should detect this error before |
| producing any output, because it can see that the Uncompressed Size |
| doesn't match with the Index in Header Metadata Block; all Multi-Block |
| decoders must detect the file as corrupt at some point. |
| |
| bad-multi-none-block_2.lzma has too big Compressed Size in the first |
| Data Block. A sophisticated decoder may be able to detect the file as |
| corrupt before producing any output, because Comrpessed Size + size |
| of Block Header exceed the Total Size stored in Index in Header |
| Metadata Block. A sophisticated decoder should be able to detect the |
| error before the end of the first Data Block; all Multi-Block decoders |
| must detect the file as corrupt at some point. |
| |
| bad-multi-none-block_3.lzma has only the Compressed Size field in the |
| Block Header of the second Data Block and EOPM isn't used. |
| |
| |
| 2.3. Malicious Files |
| |
| malicious-single-subblock31-slow.lzma requires quite a bit of CPU time |
| per decoded byte. It contains LZMA compressed Subblock filter data that |
| has as much Padding as the specification allows. LZMA is also used as |
| a Subfilter, to further slowdown the decoder. Every Subfilter instance |
| produces only one byte of output. If you can create a file that wastes |
| notably more CPU cycles than this file, please contact Lasse Collin. |
| |
| malicious-single-subblock-256MiB.lzma is a tiny file that produces |
| 256 MiB of output. It uses Subblock filter's run-length encoding |
| to achieve this. |
| |
| malicious-single-subblock-64PiB.lzma is a tiny file that produces |
| 64 PiB of output (if you have patience to wait). This is done by |
| chaining two Subblock filters and using their run-length encoders. |
| |
| malicious-multi-metadata-64PiB.lzma is like |
| malicious-single-subblock-64PiB.lzma but the huge amount of output |
| is in a Metadata Block. Trying to decode this file may take years |
| unless the decoder catches that the Metadata has unreasonable size. |
| |