请输入您要查询的百科知识:

 

词条 Code page 932 (Microsoft Windows)
释义

  1. Terminology

  2. Differences from standard Shift JIS

      Double-byte character differences    Single-byte character differences  

  3. Layout

  4. See also

  5. References

  6. External links

      Microsoft related    IBM related  
{{about|Microsoft's Code Page 932 and IBM's Code Page 943|IBM's Code Page 932|Code page 932 (IBM)}}{{redirect|Windows-31J|the operating system version|Windows 3.1J}}{{short description|Japanese Windows character encoding / Shift JIS variant.}}{{infobox character encoding
| name = Windows Code page 932
| mime = Windows-31J
| alias = CP943C
| standard = WHATWG Encoding Standard (as "Shift_JIS")
| lang = Japanese
| status =
| extends = Shift_JIS
| prev =
| next =
| classification = Extended ASCII,{{efn|Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.}} Variable-width encoding, CJK encoding
| extra =
{{notelist}}

}}

Microsoft Windows code page 932 (abbreviated MS932,[1][2] Windows-932[2] or ambiguously CP932[3]), also called Windows-31J amongst other names (see § Terminology below), is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.

IBM offer the same extended double-byte codes in their code page 943 (IBM-943 or CP943),[4] which is a combination of the single-byte Code page 897 and the double-byte Code page 941.[5]

Terminology

Microsoft's Shift JIS variant is known simply as "Code page 932" on Microsoft Windows, however this is ambiguous as IBM's code page 932, while also a Shift JIS variant, lacks the NEC and NEC-selected double-byte vendor extensions which are present in Microsoft's variant (although both include the IBM extensions) and preserves the 1978 ordering of JIS X 0208.[4]

IBM's code page 943 (or "IBM-943") includes the same double byte codes as Windows code page 932.[4] Microsoft's version corresponds closely to the encoding referred to as ibm-943_P15A-2003 (with aliases including CP943C and Windows-932)[2] in International Components for Unicode (ICU). There is also a second ICU encoding named ibm-943_P130-1999,[10] which uses different single-byte mappings which more closely match IBM's code page definitions. (See § Single-byte character differences below for details.)

Windows code page 932 is registered with the IANA as Windows-31J.[6] The "Windows-31J" label is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead.[7] The W3C/WHATWG encoding standard used by HTML5 treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content"[8] and matches Windows code page 932 (including the "formerly proprietary extensions from IBM and NEC").[9]

Windows code page 932 is also called MS_Kanji,[2][10] although IANA treat MS_Kanji as an alias for standard Shift JIS.[6]

In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though ANSI was not involved in its definition.

Differences from standard Shift JIS

Windows-31J is often mistaken for standard Shift JIS (as defined in JIS X 0208:1997 Appendix 1): while similar, the distinction is significant for computer programmers wishing to avoid mojibake.

Double-byte character differences

In addition to the standard JIS X 0201:1997 and JIS X 0208:1997 characters, Windows-31J includes several JIS X 0208 extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",[6] in addition to setting some encoding space aside for end user definition.[11] This also differs from IBM-932, which does not include the NEC extensions or NEC selection.[4]

Some of these representations were subsequently used for different characters by JIS X 0213 and Shift JIS-2004. For example, compare row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)[12] to row 89 as used by JIS X 0208 with IBM/NEC extensions (beginning 纊, 褜, 鍈…).[13] Consequently, Shift JIS-2004 is not compatible with Windows-31J.

In addition to the above, Microsoft uses different (but visually similar) Unicode mapping for several double-byte punctuation characters compared to standard Shift JIS, such as the wave dash being mapped to U+FF5E rather than U+301C,[14] which is followed by ibm-943_P15A-2003[15] but not ibm-943_P130-1999,[16] and using different mapping for the double byte backslash.[14]

Single-byte character differences

Windows-932 includes standard 7-bit ASCII mappings for single-byte sequences with the high bit set to 0. Hence, codes 0x5C and 0x7E are mapped to Unicode as U+005C REVERSE SOLIDUS (\\, the backslash) and U+007E TILDE (~) respectively,[17][18][14] as they are in ASCII (ISO-646-US). This is likewise done by the W3C/WHATWG encoding standard.[19] By contrast, 0x5C is mapped to U+00A5 YEN SIGN (¥) in ISO-646-JP and consequently JIS X 0201, of which standard Shift JIS is an extension. Correspondingly, Windows-31J avoids duplicate encoding of the backslash by mapping the double byte 0x815F to U+FF3C FULLWIDTH REVERSE SOLIDUS, whereas standard Shift JIS maps it to U+005C.[14]

However, 0x5C in Windows-932 is nonetheless considered a Yen sign in certain contexts.[20] For this reason, in many Japanese fonts, U+005C is displayed as a Yen symbol, which would normally be represented as U+00A5, rather than as a backslash per Unicode's suggested rendering. U+00A5 is one-way best-fit mapped onto 0x5C in Windows-932. However, code 0x5C in Windows-932 behaves as a reverse solidus (backslash) in all respects (e.g. in file paths on Windows systems) other than how it is displayed by some fonts,[20] and Microsoft's documentation for Windows-932 displays 0x5C as a backslash.[18] This mapping[17] corresponds to the encoding named "ibm-943_P15A-2003" in International Components for Unicode (ICU),[2] except for minor reordering of a few C0 control characters.

IBM-943, like IBM-932,[4] is a superset of the single-byte Code page 897,[5] which maps 0x5C to the Yen symbol (¥) and 0x7E to the overline (),[21] this is followed by the encoding named "ibm-943_P130-1999" in ICU.[22] Code page 897 (and therefore also IBM-943 and IBM-932) also adds single-byte box-drawing characters replacing certain C0 control characters,[21] however these may still be treated as control characters depending on the context,[23] and are mapped to control characters in ICU.[22]

Layout

{{Shift-JIS byte map extended|windows}}

See also

  • LMBCS-16
  • Code page 942

References

1. ^{{cite web | url=https://www.w3.org/Bugs/Public/show_bug.cgi?id=27851 | title=Bug 27851 - Add MS932 as a label of Shift_JIS | work=w3.org Bug Tracker | author=Sivonen, Henri}}
2. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=- | title=Converter Explorer: ibm-943_P15A-2003 (alias windows-31j) | work=International Components for Unicode: ICU Demonstration}}
3. ^{{cite web|url=https://www.debian.org/doc/manuals/debian-reference/ch11.en.html|title=Chapter 11. Data conversion|work=Debian Reference|last=Aoki|first=Osamu|publisher=Debian}}
4. ^{{cite web | url=https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.nlsgdrf/ibm-943_ibm-932.htm | title=IBM-943 and IBM-932 | publisher=IBM | work=IBM Knowledge Center}}
5. ^{{cite web | url=http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | title=Coded character set identifiers - CCSID 943 | publisher=IBM | work=IBM Globalization | archive-url=https://web.archive.org/web/20160315110642/http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | archive-date=2016-03-15}}
6. ^{{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets}}
7. ^{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property - .NET Framework (current version) |work=MSDN |publisher=Microsoft}}
8. ^{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard}}
9. ^{{cite web | url=https://encoding.spec.whatwg.org/#index-jis0208 | title=5. Indexes (§ Index jis0208) | publisher=WHATWG | work=Encoding Standard}}
10. ^{{cite web | url=https://docs.python.org/3.6/library/codecs.html#standard-encodings | title=7.2.3. Standard Encodings | publisher=Python Software Foundation | work=Python 3.6 Documentation | accessdate=19 September 2017}}
11. ^{{cite web | url=http://archives.miloush.net/michkap/archive/2007/05/26/2901371.html | title=The PUA outside of Unicode | author=Kaplan, Michael S | work=Sorting it all out | date=2007-05-26}}
12. ^{{cite web | url=https://www.itscj.ipsj.or.jp/iso-ir/233.pdf | title=233: Japanese Graphic Character Set for Information Interchange, Plane 1 | publisher=IPSJ}}
13. ^{{cite web | url=https://encoding.spec.whatwg.org/jis0208.html | title=Index jis0208 visualization | publisher=WHATWG | work=Encoding Standard}}
14. ^{{cite web | url = https://www.w3.org/TR/japanese-xml/#ambiguity_of_yen | title = Ambiguities in conversion from Shift-JIS to Unicode (Non-Normative) | work = XML Japanese Profile | publisher=W3C}}
15. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=81&s=ALL#layout | title=Converter Explorer: ibm-943_P15A-2003: start byte 0x81 | publisher=International Components for Unicode | work=ICU Demonstration}}
16. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P130-1999&b=81&s=ALL#layout | title=Converter Explorer: ibm-943_P130-1999: start byte 0x81 | publisher=International Components for Unicode | work=ICU Demonstration}}
17. ^{{cite web | url=https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT | title=CP932.TXT | publisher=Unicode Consortium}}
18. ^{{cite web | url=https://msdn.microsoft.com/en-us/library/cc194889.aspx | title=Lead byte NULL — Code page 932 | publisher=Microsoft}}
19. ^{{cite web | url=https://encoding.spec.whatwg.org/#shift_jis-decoder | title=12.3.1. Shift_JIS decoder | publisher=WHATWG | work=Encoding Standard}} "If byte is an ASCII byte or 0x80, return a code point whose value is byte."
20. ^{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=http://archives.miloush.net/michkap/archive/2005/09/17/469941.html | work=Sorting it all out}}
21. ^{{cite web | url=ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00897.txt | title=CP00897.txt | publisher=IBM | archive-date=2019-01-12 | dead-url=no | archive-url=https://www.webcitation.org/75NZsweMG}}
22. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 | work=International Components for Unicode: ICU Demonstration | title=Converter Explorer: ibm-943_P130-1999}}
23. ^{{cite web | url=http://www-01.ibm.com/software/globalization/cp/cp00897.html | title=Code page identifiers - CP 00897 | publisher=IBM | work=IBM Globalization | dead-url=yes | archive-url=https://web.archive.org/web/20160317053427/http://www-01.ibm.com/software/globalization/cp/cp00897.html | archive-date=2016-03-17}}

External links

Microsoft related

  • [https://msdn.microsoft.com/en-us/library/cc194887.aspx Microsoft's Reference for Windows Code Page 932]
  • [https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt Code page file for MS932]
  • [https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT Mapping of Microsoft's Code Page 932 to Unicode]
  • ICU Code Page 943C (ibm-943_P15A-2003 alias windows-31j) demonstration

IBM related

  • [https://web.archive.org/web/20160315110642/http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html IBM's documentation of Code Page 943]
  • ICU Code Page 943 (ibm-943_P130-1999) demonstration
  • ICU mapping for ibm-943_P130-1999 to Unicode
{{character encoding}}

3 : Character sets|Windows code pages|Encodings of Japanese

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 21:03:29