词条 | Valid characters in XML |
释义 |
This article describes and classifies the Unicode characters that may validly appear in XML. XML 1.0Unicode code points in the following ranges are valid in XML 1.0 documents:[1]
The preceding code points ranges contain the following controls which are only valid in certain contexts in XML 1.0 documents, and whose usage is restricted and highly discouraged:
XML 1.1Unicode code points in the following code point ranges are always valid in XML 1.1 documents:[2]
The preceding code points ranges contain the following controls which are only valid in certain contexts in XML 1.1 documents, and whose usage is restricted and highly discouraged:
Characters allowed but discouragedIn addition, the following code points, even though they are valid in all XML 1.0 and XML 1.1 documents, are also restricted and discouraged in both versions of XML, as they are permanently assigned to non-characters in Unicode and ISO/IEC 10646. Some XML parsers may even signal them as invalid in their character set decoder, and XML documents containing them may not pass through some restricted interfaces or may not be interchangeable. These non-characters can still be encoded in standard UTFs (such as UTF-8) because these UTFs only restrict the code points assigned to surrogate non-characters:
Note that the code point U+0000, assigned to the null control character, is the only character encoded in Unicode and ISO/IEC 10646 that is always invalid in any XML 1.0 and 1.1 document. On the opposite, the code point U+0085 is a valid control character in Unicode and ISO/IEC 10646, as well as in XML 1.0 and XML 1.1 documents (in all contexts), and its usage is not discouraged (it is treated as whitespace in many XML contexts, or as a line-break control similar to U+000D and U+000A in preformatted texts in some XML applications). Non-restricted charactersFor these reasons, the non-restricted repertoire which can be used in all versions of XML and in all contexts (as permitted by the XML syntax) contains only code points that are permanently assigned to characters (excluding non-characters), or reserved for possible future encoding in Unicode and ISO/IEC 10646, and excludes the restricted repertoire, for better interoperability. They are:
See also
References1. ^http://www.w3.org/TR/2006/REC-xml-20060816/#charsets 2. ^http://www.w3.org/TR/xml11/#charsets
1 : XML |
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。