词条 | Binary Ordered Compression for Unicode | |||||||||||||||||||||||||||||||||||||
释义 |
For comparison SCSU was adopted as standard Unicode compression scheme with a byte/code point ratio similar to language-specific code pages. SCSU has not been widely adopted, as it is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently.[2] Both SCSU[3] and BOCU-1[4] are IANA registered charsets. DetailsAll numbers in this section are hexadecimal, and all ranges are inclusive. Code points from
The difference between the current code point and the normalized previous code point is encoded as follows:
Each byte range is lexicographically ordered with the following thirteen byte values excluded: Any ASCII input BOCU-1 offers a similar robustness also for input texts without the above-mentioned values with the special reset code The optional use of a signature In theory UTF-1 and UTF-8 could encode the original UCS-4 set with 31 bits up to the modern Unicode set from Note that the reset byte PatentThe general BOCU algorithm is covered by United States Patent #6,737,994, which also mentions the specific BOCU-1 implementation.[5] IBM, which employed both of the inventors of BOCU-1 at the time it was created, states in the Unicode Technical Note that implementers of a "fully compliant version of BOCU-1" must contact IBM to request a royalty-free license.[6] BOCU-1 is the only Unicode compression scheme described on the Unicode Web site that is known to be encumbered with intellectual property restrictions. By contrast, IBM also filed for a patent on UTF-EBCDIC, but it chose in that case to make the documentation and encoding scheme “freely available to anyone concerned towards making the transformation format as part of the UCS standards,” instead of requiring implementers to request a license.[7] In HTMLSupporting BOCU-1 in HTML documents is prohibited by the W3C[8][9] and WHATWG[10] HTML standards, as it would present a cross-site scripting vulnerability.[11] References1. ^{{cite web |url=https://www.unicode.org/notes/tn6/#Introduction |title=UTN #6: BOCU-1|date=2006-02-04 |author=Markus Scherer, Mark Davis |accessdate=2008-05-18}} 2. ^{{cite web |url=http://unicode.org/notes/tn14 |title=UTN #14: A survey of Unicode compression|date=2004-01-30 |first=Doug |last=Ewell |accessdate=2008-06-13 |format=PDF }} 3. ^IANA registration record for SCSU 4. ^IANA registration record for BOCU-1 5. ^{{cite web |url=http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=6737994.PN.&OS=PN/6737994&RS=PN/6737994 |title=United States Patent #6,737,994, "Binary-ordered compression for unicode" |date=2004-05-18 |author=Davis |accessdate=2008-11-16|display-authors=etal}} 6. ^{{cite web |url=https://www.unicode.org/notes/tn6/#Intellectual_Property |title=UTN #6: BOCU-1|date=2006-02-04 |author=Markus Scherer, Mark Davis |accessdate=2014-02-05}} 7. ^{{cite web |url=https://www.unicode.org/reports/tr16/#Bibliography |title=UTR #16: UTF-EBCDIC|date=2002-04-16 |author=V.S. Umamaheswaran |accessdate=2008-11-16}} 8. ^{{Cite web |url=https://www.w3.org/TR/html51/syntax.html#character-encodings |title=8.2.2.3. Character encodings |website=HTML 5.1 Standard |publisher=W3C}} 9. ^{{Cite web |url=https://www.w3.org/TR/html5/syntax.html#character-encodings |title=8.2.2.3. Character encodings |website=HTML 5 Standard |publisher=W3C}} 10. ^{{Cite web |url=https://html.spec.whatwg.org/multipage/parsing.html#character-encodings |title=12.2.3.3 Character encodings |website=HTML Living Standard |publisher=WHATWG}} 11. ^{{Cite web |url=https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta |title=<meta> - HTML |website=MDN Web Docs |publisher=Mozilla}} See also
2 : Data compression|Unicode Transformation Formats |
|||||||||||||||||||||||||||||||||||||
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。