|
A string literal is the representation of a string value within the source code of a computer program. There exist numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question. Nevertheless, there are some general guidelines that most modern programming languages follow. In computer programming and some branches of mathematics, strings are sequences of various simple objects. ...
Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ...
The terms computer program, software program, applications program, system software, or just program are used to refer to either an executable program by both lay people and computer programmers or the collection of source code from which an executable program is created (eg, compiled). ...
A programming language is an artificial language that can be used to control the behavior of a machine, particularly a computer. ...
Specifically, most string literals can be specified using: - declarative notation;
- whitespace delimiters (indentation);
- bracketed delimiters (quoting);
- escape characters; or
- a combination of some or all of the above
Declarative notation
In the original FORTRAN programming language, string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string: Fortran (previously FORTRAN[1]) is a general-purpose[2], procedural,[3] imperative programming language that is especially suited to numeric computation and scientific computing. ...
27HAn example Hollerith string This declarative notation style is contrasted with bracketed delimiter quoting, because it does not require the use of balanced "bracketed" characters on either side of the string. Advantages: Drawbacks: Delimiters are marks which are used to seperate subfields of data. ...
A metacharacter is a character that has a general meaning instead of a literal meaning in a regular expression. ...
- this type of notation is error-prone for manual entry by programmers
Because of the drawbacks, most programming languages do not use this style of declarative notation. A programmer or software developer is someone who programs computers, that is, one who writes computer software. ...
Whitespace delimiters In YAML, string literals may be specified by the relative positioning of whitespace and indentation. YAML is a human-readable data serialization format that takes concepts from languages such as XML, C, Python, and Perl, as well as the format for electronic mail as specified by RFC 2822. ...
For information on the programming language Whitespace, see Whitespace programming language. ...
- title: An example multi-line string in YAML body : | This is a multi-line string. "special" metacharacters may appear here. The content of this string is indicated by indentation. Bracketed delimiters Most modern programming languages use bracket delimiters or quoting to specify string literals. Double quotes are the most common quoting delimiters used: Delimiters are marks which are used to seperate subfields of data. ...
"Hi There!" Some languages also allow the use of single quotes as an alternative to double quotes (though the string must begin and end with the same kind of quotation mark): 'Hi There!' Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter technology which was the precursor of the earliest computer input and output devices. The Unicode character set includes paired (separate opening and closing) versions of both single and double quotes: Mechanical desktop typewriters, such as this Underwood Five, were long time standards of government agencies, newsrooms, and sales offices. ...
Because of technical limitations, some web browsers may not display some special characters in this article. ...
“Hi There!” ‘Hi There!’ The paired double quotes can be used in Visual Basic .NET. Visual Basic . ...
The PostScript programming language uses parentheses, with embedded newlines allowed, and also embedded unescaped parentheses provided they are properly paired: This article or section does not cite its references or sources. ...
(The quick (brown fox)) Delimiter collision Delimiter collision is a common problem for string literal notations that use quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, nearly every programming and markup language has methods for avoiding delimiter collision.
Dual quoting style Some languages (e.g. Modula-2, Javascript) attempt to avoid the delimiter collision problem by allowing a dual quoting style. Typically, this consists of allowing the programmer to use either single quotes or double quotes interchangeably. Modula-2 is a computer programming language invented by Niklaus Wirth at ETH around 1978, as a successor to Modula, another language by him. ...
Js redirects here. ...
"This is John's apple." 'I said, "Can you hear me?"' Some programming languages allow subtle variations on dual quoting, treating single quotes and double quotes slightly different (e.g. Perl). Perl, also Practical Extraction and Report Language (a backronym, see below) is a dynamic procedural programming language designed by Larry Wall and first released in 1987. ...
One problem with dual quoting is that it doesn't allow for the inclusion of both styles of quotes at once within the same literal.
Escape character One method for avoiding delimiter collision to use escape characters: In computing and telecommunication, an escape character is one which has a special meaning in a sequence of characters. ...
"I said, "Can you hear me?"" The most commonly-used escape character for this purpose is the backslash "", the tradition for which originated on Unix. From a language design standpoint, this approach is adequate, but there are drawbacks: - text can be rendered unreadable when littered with numerous escape characters
- escape characters are required to be escaped, when not intended as escape characters
- although easy to type, they can be cryptic to someone unfamiliar with the language
"I said, "The Windows path is C:FooBarBaz"" The confusing presence of too many escape and slash characters in a string is commonly disparaged as Leaning Toothpick Syndrome. In programming, Leaning Toothpick Syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escaped slashes (/), usually set off by backslashes (), along with other visually similar characters such as pipes (|). The official Perl documentation introduced the term into wide usage; there...
Escape sequence An extended concept of the escape character, an escape sequence is also a means of avoiding delimiter collision. An escape sequence consists of two or more consecutive characters that can have special meaning when used in the context of a string literal. "I said, 4Can you hear me?4" Escape sequences can also be used for purposes other than avoiding delimiter collision, and can also include metacharacters. (see Metacharacters below).
Double-up and Triple-up escape sequence Some languages (such as BASIC and DCL) avoid delimiter collision by doubling up on the quotation marks that are intended to be part of the string literal itself: Screenshot of Atari BASIC, one of the first BASIC languages for small computers. ...
DCL is the standard Command line interface (CLI) adopted by most of the operating systems that were sold by the former Digital Equipment Corporation (which has since been acquired by Hewlett-Packard). ...
"I said, ""Can you hear me?""" Some languages also use triple quoting, which originated in Python [citation needed]: Python is an interpreted programming language created by Guido van Rossum in 1990. ...
'''This is John's apple.''' Multiple quoting style In contrast to dual quoting style, multiple quoting style is an even more flexible notation for avoiding delimiter collision. For example in Perl: Perl, also Practical Extraction and Report Language (a backronym, see below) is a dynamic procedural programming language designed by Larry Wall and first released in 1987. ...
qq^I said, "Can you hear me?"^ qq@I said, "Can you hear me?"@ qq§I said, "Can you hear me?"§ all produce the desired result through use of the quotelike operator, which allows numerous different characters to act as delimiters for string literals. Although this notation is more flexible, few languages support it. Perl and Ruby are two that do. Perl, also Practical Extraction and Report Language (a backronym, see below) is a dynamic procedural programming language designed by Larry Wall and first released in 1987. ...
Ruby is a reflective, object-oriented programming language. ...
Here documents A Here_document is an alternate quoting notation that allows the programmer to specify an arbitrary unique identifier as a content boundary for a string literal. This avoids delimiter collision, and also preserves newlines in the source code as newlines in the string literal itself. A here document (also called a here-document or a heredoc), is a way of quoting multiple lines of texts in shells such as Bash and the Bourne Shell, as well as programming languages such as Perl, PHP, and Ruby. ...
Metacharacters Many languages support the use of metacharacters inside string literals. Metacharacters have varying interpretations depending on the context and language, but are generally a kind of 'processing command' for representing printing or nonprinting characters. A metacharacter is a character that has a general meaning instead of a literal meaning in a regular expression. ...
For instance, in a C string literal, if the backslash is followed by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline or tab character respectively. Or if the backslash is followed by 3 octal digits, then this sequence is interpreted as representing the arbitrary character with the specified ASCII code. This was later extended to allow more modern hexadecimal character code notation: Wikibooks has a book on the topic of C Programming The C programming language (often, just C) is a general-purpose, procedural, imperative computer programming language developed in the early 1970s by Dennis Ritchie for use on the Unix operating system. ...
The octal numeral system is the base-8 number system, and uses the digits 0 to 7. ...
There are 95 printable ASCII characters, numbered 32 to 126. ...
In mathematics and computer science, base-16, hexadecimal, or simply hex, is a numeral system with a radix or base of 16 usually written using the symbols 0â9 and AâF or aâf. ...
"I said,tt4Can you hear me?4n" Raw strings A few languages follow a convention where a leading character marks a string as being "raw": r"The Windows path is C:FooBarBaz" Other languages follow a the same convention using alternate quoting delimiters: <![CDATA[ The Windows path is C:FooBarBaz ]]> or: q'The Windows path is C:FooBarBaz'; A raw string is simply one in which none of the characters are interpreted as metacharacters, and no special interpretation or processing is applied in representing the string literal.
Variable interpolation Languages differ on whether and how to interpret string literals as either 'raw' or 'variable interpolated'. Variable interpolation is the process of evaluating an expression containing one or more variables, and returning output where the variables are replaced with their corresponding values in memory. For example, the following Perl code: Perl, also Practical Extraction and Report Language (a backronym, see below) is a dynamic procedural programming language designed by Larry Wall and first released in 1987. ...
$sName = "Nancy"; $sGreet = "Hello World"; print "$sName said $sGreet to the crowd of people."; produces the output: Nancy said Hello World to the crowd of people. The sigil character ($) is interpreted to indicate variable interpolation. The term sigil may refer to: A seal (device) or signet ring. ...
Similarly, the printf function produces the same output using notation such as: printf "%s said %s to the crowd of people.", ($sName,$sGreet); The metacharacters (%s) indicates variable interpolation. This is contrasted with "raw" strings: print r'$sName said $sGreet to the crowd of people.'; which produce output like: $sName said $sGreet to the crowd of people. The ($) characters are not interpreted to have any special meaning.
Embedding source code in string literals Languages that lack flexibility in specifying string literals make it particularly cumbersome to write programming code that generates other programming code. This is particularly true when the generation language is the same or similar to the output language. for example: - writing code to produce quines
- using XSLT to generate XSLT, or SQL to generate more SQL
- generating a PostScript representation of a document for printing purposes, from within a document-processing application written in C or some other language.
Nevertheless, some languages are particularly well-adapted to produce this sort of self-similar output, especially those that support multiple options for avoiding delimiter collision. In computing, a quine is a program (a form of metaprogram) that produces its complete source code as its only output. ...
...
SQL (commonly expanded to Structured Query Language â see History for the terms derivation) is the most popular computer language used to create, modify, retrieve and manipulate data from relational database management systems. ...
This article or section does not cite its references or sources. ...
Wikibooks has a book on the topic of C Programming The C programming language (often, just C) is a general-purpose, procedural, imperative computer programming language developed in the early 1970s by Dennis Ritchie for use on the Unix operating system. ...
Apart from the mechanics of specifiying string literals, however, one must consider security implications of code that generates other code, especially if the output is based at least partially on untrusted user input. This is potentially a serious security weakness. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection attack. SQL injection is a security vulnerability that occurs in the database layer of an application. ...
External links - Escape sequences in Java, C, C++, VB, Python and other languages
|