5 Lexical conventions [lex]

5.13 Literals [lex.literal]

5.13.1 Kinds of literals [lex.literal.kinds]

There are several kinds of literals.15
15)15)
The term “literal” generally designates, in this document, those tokens that are called “constants” in C.

5.13.2 Integer literals [lex.icon]

binary-digit: one of
0 1
octal-digit: one of
0 1 2 3 4 5 6 7
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
hexadecimal-prefix: one of
0x 0X
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
size-suffix: one of
z Z
In an integer-literal, the sequence of binary-digits, octal-digits, digits, or hexadecimal-digits is interpreted as a base N integer as shown in table Table 7; the lexically first digit of the sequence of digits is the most significant.
[Note 1: 
The prefix and any optional separating single quotes are ignored when determining the value.
— end note]
The hexadecimal-digits a through f and A through F have decimal values ten through fifteen.
[Example 1: 
The number twelve can be written 12, 014, 0XC, or 0b1100.
The integer-literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.
— end example]
The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.
Table 8: Types of integer-literals[tab:lex.icon.type]
none
int
int
long int
unsigned int
long long int
long int
unsigned long int
long long int
unsigned long long int
u or U
unsigned int
unsigned int
unsigned long int
unsigned long int
unsigned long long int
unsigned long long int
l or L
long int
long int
long long int
unsigned long int
long long int
unsigned long long int
Both u or U
unsigned long int
unsigned long int
and l or L
unsigned long long int
unsigned long long int
ll or LL
long long int
long long int
unsigned long long int
Both u or U
unsigned long long int
unsigned long long int
and ll or LL
z or Z
the signed integer type corresponding
the signed integer type
  to std​::​size_t ([support.types.layout])
  corresponding to std​::​size_t
std​::​size_t
Both u or U
std​::​size_t
std​::​size_t
and z or Z
Except for integer-literals containing a size-suffix, if the value of an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.
If all of the types in the list for the integer-literal are signed, the extended integer type is signed.
If all of the types in the list for the integer-literal are unsigned, the extended integer type is unsigned.
If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.
If an integer-literal cannot be represented by any of the allowed types, the program is ill-formed.
[Note 2: 
An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std​::​size_t.
— end note]

5.13.3 Character literals [lex.ccon]

encoding-prefix: one of
u8  u  U  L
basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
   U+005c reverse solidus, or new-line character
simple-escape-sequence-char: one of
' " ? \ a b f n r t v
conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x
A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.
A multicharacter literal shall not have an encoding-prefix.
If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.
Multicharacter literals are conditionally-supported.
The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9.
Table 9: Character literals [tab:lex.ccon.literal]
Encoding
Kind
Type
Associated char-
Example
prefix
acter encoding
none
char
ordinary literal
'v'
multicharacter literal
int
encoding
'abcd'
L
wchar_t
wide literal
L'w'
encoding
u8
char8_t
UTF-8
u8'x'
u
char16_t
UTF-16
u'y'
U
char32_t
UTF-32
U'z'
In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.
A multicharacter literal has an implementation-defined value.
The value of any other kind of character-literal is determined as follows:
The character specified by a simple-escape-sequence is specified in Table 10.
[Note 1: 
Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.
— end note]
Table 10: Simple escape sequences [tab:lex.ccon.esc]
character
U+000a
line feed
\n
U+0009
character tabulation
\t
U+000b
line tabulation
\v
U+0008
backspace
\b
U+000d
carriage return
\r
U+000c
form feed
\f
U+0007
alert
\a
U+005c
reverse solidus
\\
U+003f
question mark
\?
U+0027
apostrophe
\'
U+0022
quotation mark
\"

5.13.4 Floating-point literals [lex.fcon]

sign: one of
+ -
floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16
The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11.
[Note 1: 
The floating-point suffixes f16, f32, f64, f128, bf16, F16, F32, F64, F128, and BF16 are conditionally-supported.
— end note]
Table 11: Types of floating-point-literals[tab:lex.fcon.type]
type
none
double
f or F
float
l or L
long double
f16 or F16
std::float16_t
f32 or F32
std::float32_t
f64 or F64
std::float64_t
f128 or F128
std::float128_t
bf16 or BF16
std::bfloat16_t
In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.
[Note 2: 
Any optional separating single quotes are ignored when determining the value.
— end note]
If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.
Otherwise, the exponent e is 0.
The scaled value of the literal is for a decimal-floating-point-literal and for a hexadecimal-floating-point-literal.
[Example 1: 
The floating-point-literals 49.625 and 0xC.68p+2 have the same value.
The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19 have the same value.
— end example]
If the scaled value is not in the range of representable values for its type, the program is ill-formed.
Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

5.13.5 String literals [lex.string]

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
   U+005c reverse solidus, or new-line character
r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
   the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark
d-char:
any member of the basic character set except:
   U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
   U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line
The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence of s-chars or r-chars as defined by Table 12 where n is the number of encoded code units as described below.
Table 12: String literals [tab:lex.string.literal]
Enco-
Kind
Type
Associated
Examples
ding
character
prefix
encoding
none
array of n
const char
ordinary literal encoding
"ordinary string"
R"(ordinary raw string)"
L
array of n
const wchar_t
wide literal
encoding
L"wide string"
LR"w(wide raw string)w"
u8
array of n
const char8_t
UTF-8
u8"UTF-8 string"
u8R"x(UTF-8 raw string)x"
u
array of n
const char16_t
UTF-16
u"UTF-16 string"
uR"y(UTF-16 raw string)y"
U
array of n
const char32_t
UTF-32
U"UTF-32 string"
UR"z(UTF-32 raw string)z"
A string-literal that has an R in the prefix is a raw string literal.
The d-char-sequence serves as a delimiter.
The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.
A d-char-sequence shall consist of at most 16 characters.
[Note 1: 
The characters '(' and ')' can appear in a raw-string.
Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".
— end note]
[Note 2: 
A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.
Assuming no whitespace at the beginning of lines in the following example, the assert will succeed: const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
— end note]
[Example 1: 
The raw string R"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".
The raw string R"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".
— end example]
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.
The string-literals in any sequence of adjacent string-literals shall have at most one unique encoding-prefix among them.
The common encoding-prefix of the sequence is that encoding-prefix, if any.
[Note 3: 
A string-literal's rawness has no effect on the determination of the common encoding-prefix.
— end note]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated.
The lexical structure and grouping of the contents of the individual string-literals is retained.
[Example 2: 
"\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB').
Similarly, R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a universal-character-name).
Table 13 has some examples of valid concatenations.
— end example]
Table 13: String literal concatenations [tab:lex.string.concat]
Source
Means
Source
Means
Source
Means
u"a"
u"b"
u"ab"
U"a"
U"b"
U"ab"
L"a"
L"b"
L"ab"
u"a"
"b"
u"ab"
U"a"
"b"
U"ab"
L"a"
"b"
L"ab"
"a"
u"b"
u"ab"
"a"
U"b"
U"ab"
"a"
L"b"
L"ab"
Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).
[Note 4: 
String literal objects are potentially non-unique ([intro.object]).
Whether successive evaluations of a string-literal yield the same or a different object is unspecified.
— end note]
[Note 5: 
The effect of attempting to modify a string literal object is undefined.
— end note]
String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence of s-chars (originally from non-raw string literals) and r-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:
  • The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding.
    If a character lacks representation in the associated character encoding, then the program is ill-formed.
    [Note 6: 
    No character lacks representation in any Unicode encoding form.
    — end note]
    When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence.
    [Note 7: 
    The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently.
    — end note]
  • Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
    When encoding a stateful character encoding, these sequences should have no effect on encoding state.
  • Each conditional-escape-sequence ([lex.ccon]) contributes an implementation-defined code unit sequence.
    When encoding a stateful character encoding, it is implementation-defined what effect these sequences have on encoding state.

5.13.6 Unevaluated strings [lex.string.uneval]

Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.
An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true
The Boolean literals are the keywords false and true.
Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

The pointer literal is the keyword nullptr.
It has type std​::​nullptr_t.
[Note 1: 
std​::​nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.
— end note]

5.13.9 User-defined literals [lex.ext]

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
[Example 1: 
123_km is a user-defined-literal, but 12LL is an integer-literal.
— end example]
The syntactic non-terminal preceding the ud-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.
A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).
To determine the form of this call for a given user-defined-literal L with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-id whose literal suffix identifier is X ([basic.lookup.unqual]).
S shall not be empty.
If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.
If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form operator ""X(nULL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("n")
Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'', '', ... ''>() where n is the source character sequence .
[Note 1: 
The sequence can only contain characters from the basic character set.
— end note]
If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.
If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form operator ""X(fL)
Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.
If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("f")
Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'', '', ... ''>() where f is the source character sequence .
[Note 2: 
The sequence can only contain characters from the basic character set.
— end note]
If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character).
If S contains a literal operator template with a non-type template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the form operator ""X<str>()
Otherwise, the literal L is treated as a call of the form operator ""X(str, len)
If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.
S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the form operator ""X(ch)
[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t); unsigned operator ""_w(const char*); int main() { 1.2_w; // calls operator ""_w(1.2L) u"one"_w; // calls operator ""_w(u"one", 3) 12_w; // calls operator ""_w("12") "two"_w; // error: no applicable literal operator } — end example]
In translation phase 6 ([lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose.
During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].
At the end of phase 6, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.
[Example 3: int main() { L"A" "B" "C"_x; // OK, same as L"ABC"_x "P"_x "Q" "R"_y; // error: two different ud-suffixes } — end example]