词条 | Code page 932 (Microsoft Windows) |
释义 |
| name = Windows Code page 932 | mime = Windows-31J | alias = CP943C | standard = WHATWG Encoding Standard (as "Shift_JIS") | lang = Japanese | status = | extends = Shift_JIS | prev = | next = | classification = Extended ASCII,{{efn|Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.}} Variable-width encoding, CJK encoding | extra = {{notelist}} }} Microsoft Windows code page 932 (abbreviated MS932,[1][2] Windows-932[2] or ambiguously CP932[3]), also called Windows-31J amongst other names (see § Terminology below), is the Microsoft Windows code page for the Japanese language, which is an extended variant of the Shift JIS Japanese character encoding. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding. IBM offer the same extended double-byte codes in their code page 943 (IBM-943 or CP943),[4] which is a combination of the single-byte Code page 897 and the double-byte Code page 941.[5] TerminologyMicrosoft's Shift JIS variant is known simply as "Code page 932" on Microsoft Windows, however this is ambiguous as IBM's code page 932, while also a Shift JIS variant, lacks the NEC and NEC-selected double-byte vendor extensions which are present in Microsoft's variant (although both include the IBM extensions) and preserves the 1978 ordering of JIS X 0208.[4] IBM's code page 943 (or "IBM-943") includes the same double byte codes as Windows code page 932.[4] Microsoft's version corresponds closely to the encoding referred to as ibm-943_P15A-2003 (with aliases including CP943C and Windows-932)[2] in International Components for Unicode (ICU). There is also a second ICU encoding named ibm-943_P130-1999,[10] which uses different single-byte mappings which more closely match IBM's code page definitions. (See § Single-byte character differences below for details.) Windows code page 932 is registered with the IANA as Windows-31J.[6] The "Windows-31J" label is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead.[7] The W3C/WHATWG encoding standard used by HTML5 treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content"[8] and matches Windows code page 932 (including the "formerly proprietary extensions from IBM and NEC").[9] Windows code page 932 is also called MS_Kanji,[2][10] although IANA treat MS_Kanji as an alias for standard Shift JIS.[6] In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though ANSI was not involved in its definition. Differences from standard Shift JISWindows-31J is often mistaken for standard Shift JIS (as defined in JIS X 0208:1997 Appendix 1): while similar, the distinction is significant for computer programmers wishing to avoid mojibake. Double-byte character differencesIn addition to the standard JIS X 0201:1997 and JIS X 0208:1997 characters, Windows-31J includes several JIS X 0208 extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",[6] in addition to setting some encoding space aside for end user definition.[11] This also differs from IBM-932, which does not include the NEC extensions or NEC selection.[4] Some of these representations were subsequently used for different characters by JIS X 0213 and Shift JIS-2004. For example, compare row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)[12] to row 89 as used by JIS X 0208 with IBM/NEC extensions (beginning 纊, 褜, 鍈…).[13] Consequently, Shift JIS-2004 is not compatible with Windows-31J. In addition to the above, Microsoft uses different (but visually similar) Unicode mapping for several double-byte punctuation characters compared to standard Shift JIS, such as the wave dash being mapped to U+FF5E rather than U+301C,[14] which is followed by ibm-943_P15A-2003[15] but not ibm-943_P130-1999,[16] and using different mapping for the double byte backslash.[14] Single-byte character differencesWindows-932 includes standard 7-bit ASCII mappings for single-byte sequences with the high bit set to 0. Hence, codes 0x5C and 0x7E are mapped to Unicode as U+005C REVERSE SOLIDUS ( However, 0x5C in Windows-932 is nonetheless considered a Yen sign in certain contexts.[20] For this reason, in many Japanese fonts, U+005C is displayed as a Yen symbol, which would normally be represented as U+00A5, rather than as a backslash per Unicode's suggested rendering. U+00A5 is one-way best-fit mapped onto 0x5C in Windows-932. However, code 0x5C in Windows-932 behaves as a reverse solidus (backslash) in all respects (e.g. in file paths on Windows systems) other than how it is displayed by some fonts,[20] and Microsoft's documentation for Windows-932 displays 0x5C as a backslash.[18] This mapping[17] corresponds to the encoding named "ibm-943_P15A-2003" in International Components for Unicode (ICU),[2] except for minor reordering of a few C0 control characters. IBM-943, like IBM-932,[4] is a superset of the single-byte Code page 897,[5] which maps 0x5C to the Yen symbol ( Layout{{Shift-JIS byte map extended|windows}}See also
References1. ^{{cite web | url=https://www.w3.org/Bugs/Public/show_bug.cgi?id=27851 | title=Bug 27851 - Add MS932 as a label of Shift_JIS | work=w3.org Bug Tracker | author=Sivonen, Henri}} 2. ^1 2 3 4 {{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=- | title=Converter Explorer: ibm-943_P15A-2003 (alias windows-31j) | work=International Components for Unicode: ICU Demonstration}} 3. ^{{cite web|url=https://www.debian.org/doc/manuals/debian-reference/ch11.en.html|title=Chapter 11. Data conversion|work=Debian Reference|last=Aoki|first=Osamu|publisher=Debian}} 4. ^1 2 3 4 {{cite web | url=https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.nlsgdrf/ibm-943_ibm-932.htm | title=IBM-943 and IBM-932 | publisher=IBM | work=IBM Knowledge Center}} 5. ^1 {{cite web | url=http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | title=Coded character set identifiers - CCSID 943 | publisher=IBM | work=IBM Globalization | archive-url=https://web.archive.org/web/20160315110642/http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | archive-date=2016-03-15}} 6. ^1 2 {{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets}} 7. ^{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property - .NET Framework (current version) |work=MSDN |publisher=Microsoft}} 8. ^{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard}} 9. ^{{cite web | url=https://encoding.spec.whatwg.org/#index-jis0208 | title=5. Indexes (§ Index jis0208) | publisher=WHATWG | work=Encoding Standard}} 10. ^{{cite web | url=https://docs.python.org/3.6/library/codecs.html#standard-encodings | title=7.2.3. Standard Encodings | publisher=Python Software Foundation | work=Python 3.6 Documentation | accessdate=19 September 2017}} 11. ^{{cite web | url=http://archives.miloush.net/michkap/archive/2007/05/26/2901371.html | title=The PUA outside of Unicode | author=Kaplan, Michael S | work=Sorting it all out | date=2007-05-26}} 12. ^{{cite web | url=https://www.itscj.ipsj.or.jp/iso-ir/233.pdf | title=233: Japanese Graphic Character Set for Information Interchange, Plane 1 | publisher=IPSJ}} 13. ^{{cite web | url=https://encoding.spec.whatwg.org/jis0208.html | title=Index jis0208 visualization | publisher=WHATWG | work=Encoding Standard}} 14. ^1 2 3 {{cite web | url = https://www.w3.org/TR/japanese-xml/#ambiguity_of_yen | title = Ambiguities in conversion from Shift-JIS to Unicode (Non-Normative) | work = XML Japanese Profile | publisher=W3C}} 15. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=81&s=ALL#layout | title=Converter Explorer: ibm-943_P15A-2003: start byte 0x81 | publisher=International Components for Unicode | work=ICU Demonstration}} 16. ^{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P130-1999&b=81&s=ALL#layout | title=Converter Explorer: ibm-943_P130-1999: start byte 0x81 | publisher=International Components for Unicode | work=ICU Demonstration}} 17. ^1 {{cite web | url=https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT | title=CP932.TXT | publisher=Unicode Consortium}} 18. ^1 {{cite web | url=https://msdn.microsoft.com/en-us/library/cc194889.aspx | title=Lead byte NULL — Code page 932 | publisher=Microsoft}} 19. ^{{cite web | url=https://encoding.spec.whatwg.org/#shift_jis-decoder | title=12.3.1. Shift_JIS decoder | publisher=WHATWG | work=Encoding Standard}} "If byte is an ASCII byte or 0x80, return a code point whose value is byte." 20. ^1 {{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=http://archives.miloush.net/michkap/archive/2005/09/17/469941.html | work=Sorting it all out}} 21. ^1 {{cite web | url=ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00897.txt | title=CP00897.txt | publisher=IBM | archive-date=2019-01-12 | dead-url=no | archive-url=https://www.webcitation.org/75NZsweMG}} 22. ^1 2 {{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 | work=International Components for Unicode: ICU Demonstration | title=Converter Explorer: ibm-943_P130-1999}} 23. ^{{cite web | url=http://www-01.ibm.com/software/globalization/cp/cp00897.html | title=Code page identifiers - CP 00897 | publisher=IBM | work=IBM Globalization | dead-url=yes | archive-url=https://web.archive.org/web/20160317053427/http://www-01.ibm.com/software/globalization/cp/cp00897.html | archive-date=2016-03-17}} External linksMicrosoft related
IBM related
3 : Character sets|Windows code pages|Encodings of Japanese |
随便看 |
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。