请输入您要查询的百科知识:

 

词条 Lempel–Ziv–Storer–Szymanski
释义

  1. Example

  2. Implementations

  3. See also

  4. References

Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in Journal of the ACM (1982, pp. 928–951).[1]

LZSS is a dictionary encoding technique. It attempts to replace a string of symbols with a reference to a dictionary location of the same string.

The main difference between LZ77 and LZSS is that in LZ77 the dictionary reference could actually be longer than the string it was replacing. In LZSS, such references are omitted if the length is less than the "break even" point. Furthermore, LZSS uses one-bit flags to indicate whether the next chunk of data is a literal (byte) or a reference to an offset/length pair.

Example

Here is the beginning of Dr. Seuss's Green Eggs and Ham, with character numbers at the beginning of lines for convenience. Green Eggs and Ham is an optimal example to illustrate LZSS compression because the book itself only contains 50 unique words, despite having a word count of 170.[2] Thus, words are repeated, however not in succession.

  0: I am Sam  9: 10: Sam I am 19: 20: That Sam-I-am! 35: That Sam-I-am! 50: I do not like 64: that Sam-I-am! 79:  80: Do you like green eggs and ham?112:113: I do not like them, Sam-I-am.143: I do not like green eggs and ham.

This text takes 177 bytes in uncompressed form. Assuming a break even point of 2 bytes (and thus 2 byte pointer/offset pairs), and one byte newlines, this text compressed with LZSS becomes 94 bytes long:

 0: I am Sam 9:10: (5,3) (0,4)16:17: That(4,4)-I-am!(19,16)I do not like45: t(21,14)49: Do you(58,5) green eggs and ham?78: (49,14) them,(24,9).(112,15)(92,18).

Note: this does not include the 12 bytes of flags indicating whether the next chunk of text is a pointer or a literal. Adding it, the text becomes 106 bytes long, which is still shorter than the original 177 bytes.

Implementations

Many popular archivers like PKZip, ARJ, RAR, ZOO, LHarc use LZSS rather than LZ77 as the primary compression algorithm; the encoding of literal characters and of length-distance pairs varies, with the most common option being Huffman coding. Most implementations stem from 1989 code by Haruhiko Okumura.[3][4] Version 4 of the Allegro library can encode and decode an LZSS format,[5] but the feature was cut from version 5. The Game Boy Advance BIOS can decode a slightly modified LZSS format.[6]

See also

  • LZ77
  • Lempel–Ziv–Welch (LZW)

References

1. ^{{cite journal | first1 = James A. | last1 = Storer | first2 = Thomas G. | last2 = Szymanski | title = Data Compression via Textual Substitution | journal = Journal of the ACM | volume = 29 | issue = 4 | pages = 928–951 | date = October 1982 | doi = 10.1145/322344.322346}}
2. ^{{cite web| url=http://www.cnn.com/2009/LIVING/wayoflife/01/23/mf.seuss.stories.behind/index.html| title=10 stories behind Dr. Seuss stories| publisher=CNN| date=January 23, 2009| accessdate=2009-01-26}}
3. ^Simtel.net mirror. [https://web.archive.org/web/19990203141013/http://oak.oakland.edu/pub/simtelnet/msdos/arcutils/lz_comp2.zip Haruhiko Okumura implementation of 1989.] Archived on February 3, 1999.
4. ^Haruhiko Okumura. [https://web.archive.org/web/20160110174426/https://oku.edu.mie-u.ac.jp/~okumura/compression/history.html History of Data Compression in Japan.] Archived on January 10, 2016.
5. ^Hargreaves, Shawn, et al. [https://github.com/liballeg/allegro5/blob/4.4/src/lzss.c Allegro source code: lzss.c]. Accessed on July 13, 2016.
6. ^Korth, Martin. {{cite web|url=http://nocash.emubase.de/gbatek.htm |title=GBATEK: GBA BIOS Decompression Functions |accessdate=2014-01-02 |deadurl=yes |archiveurl=https://web.archive.org/web/20130323133944/http://nocash.emubase.de/gbatek.htm |archivedate=2013-03-23 |df= }}. Accessed on August 3, 2008.
{{Compression Methods}}{{DEFAULTSORT:Lempel-Ziv-Storer-Szymanski}}

1 : Lossless compression algorithms

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 16:12:35