词条 | GBK (character encoding) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
释义 |
| name = Guojia Biaozhun Kuozhan (GBK) | mime = GBK | alias = CP936, MS936, windows-936, csGBK | image = GBK encoding.svg | caption = Layout of GBK (see below for a larger copy of this diagram) | standard = GBK 1.0 | lang = Primarily used for Simplified Chinese, but also supports Traditional Chinese, Japanese, English, Russian and (partially) Greek. | status = | extends = EUC-CN | prev = GB2312 | next = GB 18030 | classification = Extended ASCII,{{efn|Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.}} variable-width encoding, CJK encoding |extra = {{notelist}} }} GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China. It includes all unified CJK characters found in GB13000.1-93, i.e. ISO/IEC 10646:1993, or Unicode 1.1. Since its initial release in 1993, GBK has been extended by Microsoft in Code page 936/1386, which was then extended into GBK 1.0. GBK is also the IANA-registered internet name for the Microsoft mapping,[1] which differs from other implementations primarily by the single-byte euro sign at 0x80. GB abbreviates Guojia Biaozhun, which means national standard in Chinese, while K stands for Extension (扩展 kuòzhǎn). GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the 镕 (róng) character in former Chinese Premier Zhu Rongji's name, are now representable.[2] 0.1% of all web pages used GBK in December 2018.[3]HistoryIn 1993, the Unicode 1.1 standard was released, including 20,902 characters used in mainland China, Taiwan, Japan and Korea. Following this, China released GB13000.1-93, a national standard (guóbiāo) equivalent of Unicode 1.1. {{anchor|1993}}The GBK character set was defined in 1993 as an extension of GB2312-80, while also including the characters of GB13000.1-93 through the unused codepoints available in GB2312. Hence GBK is backward compatible with GB2312.{{anchor|cp936}}Microsoft implemented GBK in Windows 95 and Windows NT 3.51 as Code Page 936. While GBK was never an official standard, widespread usage of Windows 95 led to GBK becoming the de facto standard. While GBK included all the Chinese characters defined in Unicode 1.1 and GB13000.1-93, these standards used different code tables. The primary reason for its existence was simply to bridge the gap between GB2312-80 and GB13000.1-93.{{anchor|GBK 1.0}}In 1995, China National Information Technology Standardization Technical Committee set down the Chinese Internal Code Extension Specification ({{zh|s=汉字内码扩展规范(GBK)|p=Hànzì Nèimǎ Kuòzhǎn Guīfàn (GBK)}}), Version 1.0, known as GBK 1.0, which is a slight extension of Codepage 936. The newly added 95 characters were not found in GB 13000.1-1993, and were provisionally assigned Unicode PUA code points.[4]{{Rp|534}}Microsoft later added the euro sign to Code page 936 and assigned the code 0x80 to it. This is not a valid code point in GBK 1.0. {{anchor|gb18030}}In 2000, the GB18030-2000 standard was released, superseding yet maintaining compatibility with GBK 1.0. It increased the number of definitions of Chinese characters and extended the number of possible characters through the implementation of four-byte character spaces. The subset of GB 18030 consisting of one-byte and two-byte characters is sometimes also referred to as GBK. Mapping to Unicode has been slightly changed, though, as some characters are now defined in Unicode. In the most up-to-date form of the standard, GB 18030-2005, only 24[5] characters are still mapped to Unicode PUA (see GB 18030#PUA.)In 2002, GBK was registered as an IANA charset; the registration uses code page 936 mapping as well as CP936/MS936 aliases, but refers to GBK 1.0 specification.[1] {{anchor|W3C}}W3C's technical recommendation published in 2015[6] defines a GBK encoder as a GB 18030 encoder with a single-byte euro sign and without four-byte sequences. EncodingA character is encoded as 1 or 2 bytes. A byte in the range A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range More specifically, the following ranges of bytes are defined:
Layout diagramIn graphical form, the following figure shows the space of all 64K possible 2-byte codes. Green and yellow areas are assigned GBK codepoints, red are for user-defined characters. The uncolored areas are invalid byte combinations. Relationship to other encodingsThe areas indicated in the previous section as GBK/1 and GBK/2, taken by themselves, is simply GB2312-80 in its usual encoding, GBK/1 being the non-hanzi region and GBK/2 the hanzi region. GB2312, or more properly the EUC-CN encoding thereof, takes a pair of bytes from the range More significantly, GBK extended the range of the bytes. Having two-byte characters in the ISO-2022 GR range gives a limit of 94²=8,836 possibilities. Abandoning the ISO-2022 model of strict regions for graphics and control characters, but retaining the feature of low bytes being 1-byte characters and pairs of high bytes denoting a character, you could potentially have 128²=16,384 positions. GBK takes part of that, extending the range from Microsoft's Code Page 936 is generally thought of as being GBK.[1] However, the 95 PUA characters added in GBK 1.0 are not included in Code Page 936. Code Page 936 also has a single-byte euro sign at 0x80 which GBK 1.0 doesn't have. GBK's successor, GB18030-2000, uses the remaining range available to the second byte ({{code|30}}–{{code|39}}) to further expand the number of possibilities while retaining GBK as a subset. References1. ^1 2 {{cite web|title=Character Sets|url=https://www.iana.org/assignments/character-sets/character-sets.xhtml|accessdate=3 October 2016}} 2. ^{{cite web |url=http://www.microsoft.com/typography/unicode/936.txt |archiveurl=https://web.archive.org/web/20021001194325/http://www.microsoft.com/typography/unicode/936.txt |title=Code Page 936 - PRC GBK (XGB) |archivedate=2002-10-01 |deadurl=yes}} Conversion map between Codepage 936 and Unicode. Need manually selecting GB18030 or GBK in browser to view it correctly. 3. ^{{cite web|url=http://w3techs.com/technologies/history_overview/character_encoding |title=Historical trends in the usage of character encodings, December 2018 |publisher=W3techs.com |date= |accessdate=2018-11-20}} 4. ^1 2 {{Cite book|url=https://archive.org/details/GB18030-2005|title=GB 18030-2005: Information Technology—Chinese coded character set|last=Standardization Administration of China (SAC)|date=2005-11-18}} 5. ^GB 18030-2005 Standard p.9, 79 6. ^{{cite web|title=Encoding Standard # gbk-encoder|url=https://www.w3.org/TR/encoding/#gbk-encoder|website=W3C|accessdate=2016-10-02}} Notes{{notelist}}External links
3 : Character sets|Windows code pages|Encodings of Asian languages |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。