词条 | String literal | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
释义 |
A string literal or anonymous string[1] is a type of literal in programming for the representation of a string value within the source code of a computer program. Most often in modern languages this is a quoted sequence of characters (formally "bracketed delimiters"), as in SyntaxBracketed delimitersMost modern programming languages use bracket delimiters (also balanced delimiters) to specify string literals. Double quotations are the most common quoting delimiters used: An empty string is literally written by a pair of quotes with no character at all in between: Some languages either allow or mandate the use of single quotations instead of double quotations (the string must begin and end with the same kind of quotation mark and the type of quotation mark may or may not give slightly different semantics): These quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter technology which was the precursor of the earliest computer input and output devices. In terms of regular expressions, a basic quoted string literal is given as: This means that a string literal is written as: a quote, followed by zero, one, or more non-quote characters, followed by a quote. In practice this is often complicated by escaping, other delimiters, and excluding newlines. Paired delimitersA number of languages provide for paired delimiters, where the opening and closing delimiters are different. These also often allow nested strings, so delimiters can be embedded, so long as they are paired, but still result in delimiter collision for embedding an unpaired closing delimiter. Examples include PostScript, which uses parentheses, as in While the Unicode character set includes paired (separate opening and closing) versions of both single and double quotations, used in text, mostly in other languages than English, these are rarely used in programming languages (because ASCII is preferred, and these are not included in ASCII): “Hi There!” ‘Hi There!’ „Hi There!“ «Hi There!» The paired double quotations can be used in Visual Basic .NET, but many other programming languages will not accept them. Unpaired marks are preferred for compatibility - {{citation needed span|text=many web browsers, text editors, and other tools will not correctly display unicode paired quotes|date=October 2012}}, and so even in languages where they are permitted, many projects forbid their use for source code. Whitespace delimitersString literals might be ended by newlines. One example is MediaWiki template parameters. {{Navbox |name=Nulls |title=[[wikt:Null|Nulls]] in [[computing]] }} There might be special syntax for multi-line strings. In YAML, string literals may be specified by the relative positioning of whitespace and indentation. - title: An example multi-line string in YAML body : | This is a multi-line string. "special" metacharacters may appear here. The extent of this string is represented by indentation. No delimitersSome programming languages, such as Perl, JavaScript, and PHP, allow string literals without any delimiters in some contexts. In the following Perl and JavaScript programs, for example, Perl treats non-reserved sequences of alphanumeric characters as string literals in most contexts. For example, the following two lines of Perl are equivalent: $y = "x"; $y = x; Declarative notationIn the original FORTRAN programming language (for example), string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string: This declarative notation style is contrasted with bracketed delimiter quoting, because it does not require the use of balanced "bracketed" characters on either side of the string. Advantages:
This is however not a drawback when the prefix is generated by an algorithm as is most likely the case.{{citation needed|reason=humans don't generally write Fortran code, or what? we're talking source code formats, after all...|date=February 2012}} Constructor functionsC++ has two styles of string, one inherited from C (delimited by Before C++11, there was no literal for C++ strings (C++11 allows
all of which have the same interpretation. Since C++11, there is also new constructor syntax:
Delimiter collision{{main article|Delimiter collision}}When using quoting, if one wishes to represent the delimiter itself in a string literal, one runs into the problem of delimiter collision. For example, if the delimiter is a double quote, one cannot simply represent a double quote itself by the literal Paired quotes, such as braces in Tcl, allow nested strings, such as Doubling upA number of languages, including Pascal, BASIC, DCL, Smalltalk, SQL, J, and Fortran, avoid delimiter collision by doubling up on the quotation marks that are intended to be part of the string literal itself: Dual quotingSome languages, such as Fortran, Modula-2, JavaScript, Python, and PHP allow more than one quoting delimiter; in the case of two possible delimiters, this is known as dual quoting. Typically, this consists of allowing the programmer to use either single quotations or double quotations interchangeably – each literal must use one or the other. "This is John's apple." 'I said, "Can you hear me?"' This does not allow having a single literal with both delimiters in it, however. This can be worked around by using several literals and using string concatenation: Python has string literal concatenation, so consecutive string literals are concatenated even without an operator, so this can be reduced to: D supports a few quoting delimiters, with such strings starting with In some programming languages, such as sh and Perl, there are different delimiters that are treated differently, such as doing string interpolation or not, and thus care must be taken when choosing which delimiter to use; see different kinds of strings, below. Multiple quotingA further extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal. For example, in Perl: qq^I said, "Can you hear me?"^ qq@I said, "Can you hear me?"@ qq§I said, "Can you hear me?"§ all produce the desired result. Although this notation is more flexible, few languages support it; other than Perl, Ruby (influenced by Perl) and C++11 also support these. In C++11, raw strings can have various delimiters, beginning with Lua (as of 5.1) provides a limited form of multiple quoting, particularly to allow nesting of long comments or embedded strings. Normally one uses local ls = [=[ This notation can be used for Windows paths: local path = \\Windows\\Fonts ]=] Multiple quoting is particularly useful with regular expressions that contain usual delimiters such as quotes, as this avoids needing to escape them. An early example is sed, where in the substitution command Constructor functionsAnother option, which is rarely used in modern languages, is to use a function to construct a string, rather than representing it via a literal. This is generally not used in modern languages because the computation is done at run time, rather than at parse time. For example, early forms of BASIC did not include escape sequences or any other workarounds listed here, and thus one instead was required to use the "I said, " + CHR$(34) + "Can you hear me?" + CHR$(34) In C, a similar facility is available via sprintf("This is %cin quotes.%c", 34, 34); These constructor functions can also be used to represent nonprinting characters, though escape sequences are generally used instead. A similar technique can be used in C++ with the Escape sequences{{main article|Escape sequence}}Escape sequences are a general technique for representing characters that are otherwise difficult to represent directly, including delimiters, nonprinting characters (such as backspaces), newlines, and whitespace characters (which are otherwise impossible to distinguish visually), and have a long history. They are accordingly widely used in string literals, and adding an escape sequence (either to a single character or throughout a string) is known as escaping. One character is chosen as a prefix to give encodings for characters that are difficult or impossible to include directly. Most commonly this is backslash; in addition to other characters, a key point is that backslash itself can be encoded as a double backslash "(\\\\.|[^\\\\"])*" meaning "a quote; followed by zero or more of either an escaped character (backslash followed by something, possibly backslash or quote), or a non-escape, non-quote character; ending in a quote" – the only issue is distinguishing the terminating quote from a quote preceded by a backslash, which may itself be escaped. Multiple characters can follow the backslash, such as An escaped string must then itself be lexically analyzed, converting the escaped string into the unescaped string that it represents. This is done during the evaluation phase of the overall lexing of the computer language: the evaluator of the lexer of the overall language executes its own lexer for escaped string literals. Among other things, it must be possible to encode the character that normally terminates the string constant, plus there must be some way to specify the escape character itself. Escape sequences are not always pretty or easy to use, so many compilers also offer other means of solving the common problems. Escape sequences, however, solve every delimiter problem and most compilers interpret escape sequences. When an escape character is inside a string literal, it means "this is the start of the escape sequence". Every escape sequence specifies one character which is to be placed directly into the string. The actual number of characters required in an escape sequence varies. The escape character is on the top/left of the keyboard, but the editor will translate it, therefore it is not directly tapeable into a string. The backslash is used to represent the escape character in a string literal. Many languages support the use of metacharacters inside string literals. Metacharacters have varying interpretations depending on the context and language, but are generally a kind of 'processing command' for representing printing or nonprinting characters. For instance, in a C string literal, if the backslash is followed by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline or tab character respectively. Or if the backslash is followed by 1-3 octal digits, then this sequence is interpreted as representing the arbitrary character with the specified ASCII code. This was later extended to allow more modern hexadecimal character code notation:
Note: Not all sequences in the list are supported by all parsers, and there may be other escape sequences which are not in the list. Nested escapingWhen code in one programming language is embedded inside another, embedded strings may require multiple levels of escaping. This is particularly common in regular expressions and SQL query within other languages, or other languages inside shell scripts. This double-escaping is often difficult to read and author. Incorrect quoting of nested strings can present a security vulnerability. Use of untrusted data, as in data fields of an SQL query, should use prepared statements to prevent a code injection attack. In PHP 2 through 5.3, there was a feature called magic quotes which automatically escaped strings (for convenience and security), but due to problems was removed from version 5.4 onward. Raw stringsA few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation. This avoids the need for escaping, and yields more legible strings. Raw strings are particularly useful when a common character needs to be escaped, notably in regular expressions (nested as string literals), where backslash "The Windows path is C:\\\\Foo\\\\Bar\\\\Baz\\\\" @"The Windows path is C:\\Foo\\Bar\\Baz\\" Extreme examples occur when these are combined – Uniform Naming Convention paths begin with In XML documents, CDATA sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed. Multiline string literalsIn many languages, string literals can contain literal newlines, spanning several lines. Alternatively, newlines can be escaped, most often as echo 'foo bar' and echo -e "foo\bar" are both valid bash, producing: foo bar Languages that allow literal newlines include bash, Lua, Perl, PHP, R, and Tcl. In some other languages string literals cannot include newlines. Two issues with multiline string literals are leading and trailing newlines, and indentation. If the initial or final delimiters are on separate lines, there are extra newlines, while if they are not, the delimiter makes the string harder to read, particularly for the first line, which is often indented differently from the rest. Further, the literal must be unindented, as leading whitespace is preserved – this breaks the flow of the code if the literal occurs within indented code. The most common solution for these problems is here document-style string literals. Formally speaking, a here document is not a string literal, but instead a stream literal or file literal. These originate in shell scripts and allow a literal to be fed as input to an external command. The opening delimiter is Python, whose usual string literals do not allow literal newlines, instead has a special form of string, designed for multiline literals, called triple quoting. These use a tripled delimiter, either Tcl allows literal newlines in strings and has no special syntax to assist with multiline strings, though delimiters can be placed on lines by themselves and leading and trailing newlines stripped via String literal concatenationA few languages provide string literal concatenation, where adjacent string literals are implicitly joined into a single literal at compile time. This is a feature of C,[8][9] C++,[10] D,[11] Ruby,[12] and Python,[13] which copied it from C.[14] Notably, this concatenation happens at compile time, during lexical analysis (as a phase following initial tokenization), and is contrasted with both run time string concatenation (generally with the MotivationIn C, where the concept and term originate, string literal concatenation was introduced for two reasons:[17]
In practical terms, this allows string concatenation in early phases of compilation ("translation", specifically as part of lexical analysis), without requiring phrase analysis or constant folding. For example, the following are valid C/C++: char *s = "hello, " "world"; printf("hello, " "world"); However, the following are invalid: char *s = "hello, " + "world"; printf("hello, " + "world"); This is because string literals have pointer type, This is particularly important when used in combination with the C preprocessor, to allow strings to be computed following preprocessing, particularly in macros.[14] As a simple example: char *file_and_message = __FILE__ ": message"; will (if the file is called a.c) expand to: char *file_and_message = "a.c" ": message"; which is then concatenated, being equivalent to: char *file_and_message = "a.c: message"; A common use case is in constructing printf or scanf format strings, where format specifiers are given by macros.[19][20] A more complex example uses [https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html stringification] of integers (by the preprocessor) to define a macro that expands to a sequence of string literals, which are then concatenated to a single string literal with the file name and line number:[21]
Beyond syntactic requirements of C/C++, implicit concatenation is a form of syntactic sugar, making it simpler to split string literals across several lines, avoiding the need for line continuation (via backslashes) and allowing one to add comments to parts of strings. For example, in Python, one can comment a regular expression in this way:[22] re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore ) ProblemsImplicit string concatenation is not required by modern compilers, which implement constant folding, and causes hard-to-spot errors due to unintentional concatenation from omitting a comma, particularly in vertical lists of strings, as in: l = ['foo', 'bar' 'zork'] Accordingly, it is not used in most languages, and it has been proposed for deprecation from D[23] and Python.[14] However, removing the feature breaks backwards compatibility, and replacing it with a concatenation operator introduces issues of precedence – string literal concatenation occurs during lexing, prior to operator evaluation, but concatenation via an explicit operator occurs at the same time as other operators, hence precedence is an issue, potentially requiring parentheses to ensure desired evaluation order. A subtler issue is that in C and C++,[24] there are different types of string literals, and concatenation of these has implementation-defined behavior, which poses a potential security risk.[25] Different kinds of stringsSome languages provide more than one kind of literal, which have different behavior. This is particularly used to indicate raw strings (no escaping), or to disable or enable variable interpolation, but has other uses, such as distinguishing character sets. Most often this is done by changing the quoting character or adding a prefix or suffix. This is comparable to prefixes and suffixes to integer literals, such as to indicate hexadecimal numbers or long integers. One of the oldest examples is in shell scripts, where single quotes indicate a raw string or "literal string", while double quotes have escape sequences and variable interpolation. For example, in Python, raw strings are preceded by an C#'s notation for raw strings is called @-quoting. @"C:\\Foo\\Bar\\Baz\\" While this disables escaping, it allows double-up quotes, which allow one to represent quotes within the string: @"I said, ""Hello there.""" C++11 allows raw strings, unicode strings (UTF-8, UTF-16, and UTF-32), and wide character strings, determined by prefixes. It also adds literals for the existing C++ In Tcl, brace-delimited strings are literal, while quote-delimited strings have escaping and interpolation. Perl has a wide variety of strings, which are more formally considered operators, and are known as quote and quote-like operators. These include both a usual syntax (fixed delimiters) and a generic syntax, which allows a choice of delimiters; these include:[27] q{} qq{} qx{} qw{} m{} qr{} s{}{} tr{}{} y{}{} REXX uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g., '20'x "0010 0000"b "00100000"b all yield the space character, avoiding the function call Variable interpolation{{main article|Variable interpolation}}Languages differ on whether and how to interpret string literals as either 'raw' or 'variable interpolated'. Variable interpolation is the process of evaluating an expression containing one or more variables, and returning output where the variables are replaced with their corresponding values in memory. In sh-compatible Unix shells, quotation-delimited (") strings are interpolated, while apostrophe-delimited (') strings are not. For example, the following Perl code: $name = "Nancy"; $greeting = "Hello World"; print "$name said $greeting to the crowd of people."; produces the output: The sigil character ($) is interpreted to indicate variable interpolation. Similarly, the using notation such as: The metacharacters (%s) indicate variable interpolation. This is contrasted with "raw" strings: which produce output like: Here the $ characters are not sigils, and are not interpreted to have any meaning other than plain text. Embedding source code in string literalsLanguages that lack flexibility in specifying string literals make it particularly cumbersome to write programming code that generates other programming code. This is particularly true when the generation language is the same or similar to the output language. For example:
Nevertheless, some languages are particularly well-adapted to produce this sort of self-similar output, especially those that support multiple options for avoiding delimiter collision. Using string literals as code that generates other code may have adverse security implications, especially if the output is based at least partially on untrusted user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection attack. See also
Notes{{notelist}}References1. ^{{cite web|url=http://www.acsu.buffalo.edu/~fineberg/mfc158/week10lecture.htm|title=Introduction To Java - MFC 158 G|quote=String literals (or constants) are called ‘anonymous strings’ }} 2. ^{{cite web|url=http://www.lysator.liu.se/c/ANSI-C-grammar-l.html|title=ANSI C grammar (Lex)|work=liu.se|accessdate=22 June 2016}} 3. ^1 {{cite web|url=http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html|title=Appendix B. Characters, strings, and escaping rules|work=realworldhaskell.org|accessdate=22 June 2016}} 4. ^1 {{cite web|url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String|title=String|work=mozilla.org|accessdate=22 June 2016}} 5. ^1 2 3 4 5 6 7 8 9 10 11 12 {{cite web|url=http://msdn.microsoft.com/en-us/library/h21280bw(v=vs.80).aspx|title=Escape Sequences (C)|work=microsoft.com|accessdate=22 June 2016}} 6. ^1 {{cite web |title=Rationale for International Standard - Programming Languages - C |version=5.10 |date=April 2003 |pages=52, 153–154, 159 |url=http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf |access-date=2010-10-17 |dead-url=no |archive-url=https://web.archive.org/web/20160606072228/http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf |archive-date=2016-06-06}} 7. ^{{citation |title=GCC 4.8.2 Manual |chapter=6.35 The Character 8. ^C11 draft standard, WG14 N1570 Committee Draft — April 12, 2011, 5.1.1.2 Translation phases, p. 11: "6. Adjacent string literal tokens are concatenated." 9. ^C syntax: String literal concatenation 10. ^C++11 draft standard, {{cite web | title=Working Draft, Standard for Programming Language C++ | url=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf}}, 2.2 Phases of translation [lex.phases], p. 17: "6. Adjacent string literal tokens are concatenated." and 2.14.5 String literals [lex.string], note 13, p. 28–29: "In translation phase 6 (2.2), adjacent string literals are concatenated." 11. ^D Programming Language, Lexical Analysis, "String Literals": "Adjacent strings are concatenated with the ~ operator, or by simple juxtaposition:" 12. ^{{Citation|title=ruby: The Ruby Programming Language|date=2017-10-19|url=https://github.com/ruby/ruby|publisher=Ruby Programming Language|accessdate=2017-10-19}} 13. ^The Python Language Reference, 2. Lexical analysis, [https://docs.python.org/2/reference/lexical_analysis.html#string-literal-concatenation 2.4.2. String literal concatenation]: "Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation." 14. ^1 2 Python-ideas, "[https://mail.python.org/pipermail/python-ideas/2013-May/020527.html Implicit string literal concatenation considered harmful?]", Guido van Rossum, May 10, 2013 15. ^The Python Language Reference, 2. Lexical analysis, [https://docs.python.org/2/reference/lexical_analysis.html#string-literal-concatenation 2.4.2. String literal concatenation]: "Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must be used to concatenate string expressions at run time." 16. ^{{cite web|url=http://docs.oracle.com/javase/tutorial/java/data/strings.html |title=Strings (The Java™ Tutorials > Learning the Java Language > Numbers and Strings) |website=Docs.oracle.com |date=2012-02-28 |accessdate=2016-06-22}} 17. ^{{Cite book | isbn = 0-929306-07-4 | title = Rationale for the ANSI C Programming Language | year = 1990 | publisher = Silicon Press | url = http://www.lysator.liu.se/c/rat/title.html |page=[https://books.google.com/books?id=yxLISD0TAbEC&lpg=PA31&q=%22String%20literal%20concatenation%22#v=onepage 31] | ref = harv}}, 3.1.4 String literals: "A long string can be continued across multiple lines by using the backslash-newline line continuation, but this practice requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §3.8.3), the Committee introduced string literal concatenation. Two string literals in a row are pasted together (with no null character in the middle) to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash-newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation." 18. ^{{Cite book | isbn = 0-929306-07-4 | title = Rationale for the ANSI C Programming Language | year = 1990 | publisher = Silicon Press | url = http://www.lysator.liu.se/c/rat/title.html |page=[https://books.google.com/books?id=yxLISD0TAbEC&lpg=PA65&q=%22String%20literal%20concatenation%22#v=onepage 6566] | ref = harv}}, 3.8.3.2 The # operator: "The # operator has been introduced for stringizing. It may only be used in a #define expansion. It causes the formal parameter name following to be replaced by a string literal formed by stringizing the actual argument token sequence. In conjunction with string literal concatenation (see §3.1.4), use of this operator permits the construction of strings as effectively as by identifier replacement within a string. An example in the Standard illustrates this feature." 19. ^C/C++ Users Journal, Volume 19, [https://books.google.com/books?id=gGpVAAAAMAAJ&q=%22string+literal+concatenation%22 p. 50] 20. ^{{cite web|url=https://stackoverflow.com/questions/2504536/why-allow-concatenation-of-string-literals |title=python - Why allow concatenation of string literals? |publisher=Stack Overflow |date= |accessdate=2016-06-22}} 21. ^{{cite web|author= |url=http://www.decompile.com/cpp/faq/file_and_line_error_string.htm |title=LINE__ to string (stringify) using preprocessor directives |website=Decompile.com |date=2006-10-12 |accessdate=2016-06-22}} 22. ^The Python Language Reference, 2. Lexical analysis, [https://docs.python.org/2/reference/lexical_analysis.html#string-literal-concatenation 2.4.2. String literal concatenation]: "This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example: 23. ^DLang's Issue Tracking System – [https://issues.dlang.org/show_bug.cgi?id=3827 Issue 3827] - Warn against and then deprecate implicit concatenation of adjacent string literals 24. ^C++11 draft standard, {{cite web | title=Working Draft, Standard for Programming Language C++ | url=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf}}, 2.14.5 String literals [lex.string], note 13, p. 28–29: "Any other concatenations are conditionally supported with implementation-defined behavior." 25. ^{{cite web|url=https://www.securecoding.cert.org/confluence/display/seccode/STR10-C.+Do+not+concatenate+different+type+of+string+literals |title=Archived copy |accessdate=July 3, 2014 |deadurl=yes |archiveurl=https://web.archive.org/web/20140714135237/https://www.securecoding.cert.org/confluence/display/seccode/STR10-C.+Do+not+concatenate+different+type+of+string+literals |archivedate=July 14, 2014 }} 26. ^{{cite web|url=https://docs.python.org/2/reference/lexical_analysis.html#string-literals|title=2. Lexical analysis — Python 2.7.12rc1 documentation|work=python.org|accessdate=22 June 2016}} 27. ^{{cite web|url=http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators|title=perlop - perldoc.perl.org|work=perl.org|accessdate=22 June 2016}} External links
2 : Source code|String (computer science) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
随便看 |
|
开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。