请输入您要查询的百科知识:

 

词条 UTF-1
释义

  1. Design

  2. See also

  3. References

{{Infobox character encoding
| name = UTF-1
| mime =
| alias =
| image =
| caption =
| standard =
| lang = International
| status = Obscure, of mainly historical interest.
| classification = Unicode Transformation Format, extended ASCII,{{efn|Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.}} variable-width encoding
| encodes = ISO 10646 (Unicode)
| extends = US-ASCII
| prev =
| next = UTF-8
| extra =
{{notelist}}

}}

UTF-1 is one way of transforming ISO 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.

Design

UTF-1 is a multi-byte encoding like UTF-8; a single Unicode code point can be encoded in one, two, three, or five bytes. The ASCII range is encoded as one byte (all code points from U+0000 to U+009F are).

UTF-1 does not use the C0 and C1 control codes or the space character in multi-byte encodings, the bytes 0 - 0x20 or 0x7F - 0x9F always stand for the corresponding code point. This design with 66 protected characters tried to be ISO 2022 compatible.

UTF-1 uses "modulo 190" arithmetic (256 − 66 = 190). For comparison, UTF-8 protects all 128 ASCII characters and needs one bit for this, and a second bit to make it self-synchronizing, resulting in "modulo 64" arithmetic (8 − 2 = 6; 26 = 64). BOCU-1 protects only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).

border="1" cellspacing="3" cellpadding="3" class="wikitable" style="font-family: monospace, monospace">
code point UTF-8 UTF-1
U+007F 7F 7F
U+0080 C2 80 80
U+009F C2 9F 9F
U+00A0 C2 A0 A0 A0
U+00BF C2 BF A0 BF
U+00C0 C3 80 A0 C0
U+00FF C3 BF A0 FF
U+0100 C4 80 A1 21
U+015D C5 9D A1 7E
U+015E C5 9E A1 A0
U+01BD C6 BD A1 FF
U+01BE C6 BE A2 21
U+07FF DF BF AA 72
U+0800 E0 A0 80 AA 73
U+0FFF E0 BF BF B5 48
U+1000 E1 80 80 B5 49
U+4015 E4 80 95 F5 FF
U+4016 E4 80 96 F6 21 21
U+D7FF ED 9F BF F7 2F C3
U+E000 EE 80 80 F7 3A 79
U+F8FF EF A3 BF F7 5C 3C
U+FDD0 EF B7 90 F7 62 BA
U+FDEF EF B7 AF F7 62 D9
U+FEFF EF BB BF F7 64 4C
U+FFFD EF BF BD F7 65 AD
U+FFFE EF BF BE F7 65 AE
U+FFFF EF BF BF F7 65 AF
U+10000 F0 90 80 80 F7 65 B0
U+38E2D F0 B8 B8 AD FB FF FF
U+38E2E F0 B8 B8 AE FC 21 21 21 21
U+FFFFF F3 BF BF BF FC 21 37 B2 7A
U+100000 F4 80 80 80 FC 21 37 B2 7B
U+10FFFF F4 8F BF BF FC 21 39 6E 6C
U+7FFFFFFF FD BF BF BF BF BF FD BD 2B B9 40

Although modern Unicode ends at U+10FFFF, both UTF-1 and UTF-8 were designed to encode the complete 31 bits of the original Universal Character Set (UCS-4), and the last entry in this table shows this original final code point.

See also

  • Comparison of Unicode encodings
  • Universal Character Set

References

  • {{cite web |title=ISO IR 178: UCS Transformation Format One (UTF-1) |author=ISO/IEC JTC 1/SC2/WG2 |author-link=ISO/IEC JTC 1/SC2/WG2 |date=1993-01-21 |edition=1 |id=Registration number 178 |url=http://kikaku.itscj.ipsj.or.jp/ISO-IR/178.pdf |type=PDF, 256 KB |dead-url=yes |archive-url=https://web.archive.org/web/20150318032101/http://kikaku.itscj.ipsj.or.jp/ISO-IR/178.pdf |archive-date=2015-03-18}}
  • {{cite web |author-first=Roman |author-last=Czyborra |title=Unicode Transformation Formats: UTF-8 & Co. |date=1998-11-30 |url=http://czyborra.com/utf/#UTF-1 |access-date=2016-06-07 |dead-url=no |archive-url=https://web.archive.org/web/20160607111732/http://czyborra.com/utf/#UTF-1 |archive-date=2016-06-07}}
{{Unicode navigation}}{{character encoding}}{{DEFAULTSORT:Utf-01}}

1 : Unicode Transformation Formats

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/12 4:07:00