This is how CPP behaves in all the cases which the C standard describes as implementation-defined. This term means that the implementation is free to do what it likes, but must document its choice and stick to it.
The mapping of physical source file multi-byte characters to the execution character set.
The input character set can be specified using the
-finput-charset option, while the execution character set may
be controlled using the
The C and C++ standards allow identifiers to be composed of ‘
and the alphanumeric characters. C++ also allows universal character
names. C99 and later C standards permit both universal character
names and implementation-defined characters. In both C and C++ modes,
GCC accepts in identifiers exactly those extended characters that
correspond to universal character names permitted by the chosen
GCC allows the ‘
$’ character in identifiers as an extension for
most targets. This is true regardless of the
since this extension cannot conflict with standards-conforming
programs. When preprocessing assembler, however, dollars are not
identifier characters by default.
Currently the targets that by default do not permit ‘
$’ are AVR,
IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX
You can override the default with
fno-dollars-in-identifiers. See fdollars-in-identifiers.
Non-empty sequences of whitespace characters.
In textual output, each whitespace sequence is collapsed to a single space. For aesthetic reasons, the first token on each non-directive line of output is preceded with sufficient spaces that it appears in the same column as it did in the original source file.
The numeric value of character constants in preprocessor expressions.
The preprocessor and compiler interpret character constants in the
same way; i.e. escape sequences such as ‘
\a’ are given the
values they would have on the target machine.
The compiler evaluates a multi-character character constant a character
at a time, shifting the previous value left by the number of bits per
target character, and then or-ing in the bit-pattern of the new
character truncated to the width of a target character. The final
bit-pattern is given type
int, and is therefore signed,
regardless of whether single characters are signed or not.
If there are more
characters in the constant than would fit in the target
compiler issues a warning, and the excess leading characters are
'ab' for a target with an 8-bit
char would be
interpreted as ‘
(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')’, and
'\234a' as ‘
(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')’.
Source file inclusion.
For a discussion on how the preprocessor locates header files, Include Operation.
Interpretation of the filename resulting from a macro-expanded
See Computed Includes.
Treatment of a ‘
#pragma’ directive that after macro-expansion
results in a standard pragma.
No macro expansion occurs on any ‘
#pragma’ directive line, so the
question does not arise.
Note that GCC does not yet implement any of the standard pragmas.