Chapter 10
The C preprocessor is a simple macro processor that conceptually processes the source text of a C program before the compiler proper reads the source program...
The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character #. Lines that do not contain preprocessor commands are called lines of the program text...
The preprocessor removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands, such as ex-panding macro calls that occur within the source program text. The resulting preprocessed source text must then be a valid C program.
The syntax of preprocessor commands is completely independent of (though in some ways similar to) the syntax of the rest of the C language [HS95, page 39].
The C preprocessor performs a variety of text replacement operations on the source text before it is parsed by the C compiler. These operations include replacing symbolic names with constants, expansion of macros, and inclusion of header-files. A preprocessor directive, such as #define, has file scope, meaning that defined names are visible from their point of declaration until the end of the source file in which they appear.
The operations of the preprocessor are not subject to the syntactical rules of the C language, and so should be used sparingly and with care. In particular, the text substitution of macro expansion changes the lexical structure of the source code and may produce unexpected behaviour if certain pitfalls are not avoided.
One of the most common uses of the preprocessor is for the inclusion of files into the source text. These are usually header files, but may be any text file. The file include command is
#include
and it is followed by a file name enclosed in either <> or "". The result of this command after preprocessing is as if the command were replaced by the text of the specified file. Further details with regard to the #include directive are discussed in Section 5.6.
The preprocessor may be used to define symbolic constants via the #define command. The general form of a symbolic constant definition is as follows
#define name replacement text
whereby name becomes a placeholder for the character sequence replacement text. Given this command, the preprocessor will replace all subsequent occurrences of the token name in the source file by the sequence replacement text. Some example symbolic constants are
#define BUFFERSIZE 256
#define MIN_VALUE -32
#define PI 3.14159
A name created by #define can be undefined using the directive #undef. This name may then be used to represent a di erent sequence of replacement text.
Style Note. Symbolic constants are usually given uppercase names to di erentiate them from variable and function names. The #define command is also used to define macros, and these names are also usually uppercase.
C provides several di erent mechanisms for defining names for numerical constants. The prepro-cessor directive #define can create names for constants of any type; the type qualifier const can define constant variables of any type; and the keyword enum can define constants of integer type. Each of these is better suited to certain situations than the others. The #define command is most powerful, and can be used in any situation where the other two might be used, but it is also the most dangerous as the preprocessor does not respect C syntax rules. Variables qualified by const are generally preferred but, as these variables are not considered compile-time constant, they have one significant limitation; namely, a variable of type const int cannot be used to define the size of an array.1
#define ARRAYSIZE 10
const int ArraySize = 10;
double array1[ARRAYSIZE]; /* Valid. */
double array2[ArraySize]; /* Invalid. */
An enumeration constant does not su er this limitation and, between them, const and enum can satisfy all symbolic constant operations for which a #define might be used. As const and enum are part of the C language proper, and abide by its rules, they are to be preferred over #define in general. For example,
#define PI 3.14159
#define ARRAYSIZE 10
const double Pi = 3.14159; /* Preferred */
enum { ARRAYSIZE=10 }; /* Preferred */
Symbolic constants defined by the #define command are a simple form of macro: a symbolic name that is expanded into an expression via text substitution. The C preprocessor provides a more sophisticated type of macro definition by allowing the macro name to be followed by a set of arguments enclosed in parentheses. For example,
#define MAX(x,y) ((x)>(y) ? (x) : (y))
1 This is a historical anomaly and not a limitation of the ability of the compiler. The C++ language rectifies this oversight and, in C++, a const int may define an array size.
Although it looks like a function call, a macro behaves in a quite di erent manner. The preprocessor replaces the macro name with the defined replacement text, and substitutes the argument variables in the specified locations. Thus, MAX might be used as follows
int a=4, b= -7, c;
c = MAX(a,b);
and the preprocessor will expand the code to
int a=4, b= -7, c;
c = ((a)>(b) ? (a) : (b));
Note. In a macro definition, the parentheses immediately following the macro name must be directly adjacent to the name without whitespace. For example, the definition
#define MAX (x,y) ((x)>(y) ? (x) : (y))
is not equivalent to the previous macro definition, but will simply replace all occurrences of the name
MAX with the text: (x,y) ((x)>(y) ? (x) : (y)).
Macros are typically used for one of two reasons. The first is speed. Macros can perform function-like operations without the overhead of a function call because the code is expanded inline. With modern fast machines, using macros for speed is less important than it used to be. The second use of macros is to implement a kind of generic function. That is, to define a function-like expression that bypasses the C type constraints, and can be passed parameters of any type. For example, the macro MAX will work correctly if a, b, and c were type double, or any other type, where as an equivalent function
int max(int x, int y)
{
return x > y ? x : y;
}
will only accept integer parameters.
10.3.1 Macro Basics
Consider the following simple macros.
#define SQR(x) ((x)*(x))
#define SGN(x) (((x)<0) ? -1 : 1)
#define ABS(x) (((x)<0) ? -(x) : (x))
#define ISDIGIT(x) ((x) >= ’0’ && (x) <= ’9’)
#define NELEMS(array) (sizeof(array) / sizeof(array[0]))
SQR calculates the square of its argument, SGN calculates the sign of its argument, ABS converts its argument to an absolute value, ISDIGIT equals 1 if the argument value is between the character code for 0 and the character code for 9, and NELEMS computes the number of elements in an array.
Macros should be used with care. The preprocessor is a powerful but rather blunt instrument, and it is easy to use macros incorrectly. Macros are subject to three main dangers. The first is that the passed arguments may have surprising precedence after macro expansion. For example, if SQR were defined as
#define SQR(x) x * x
then the following expression
int a = 7;
b = SQR(a+1);
will expand to
b = a+1 * a+1; /* b equals 7 + 1*7 + 1 = 15, not the expected 64 */
For this reason, macro arguments should be heavily parenthesised as with the set of examples above. The second danger is that arguments with side-e ects may be evaluated multiple times after macro expansion. For example,
b = ABS(a++);
will expand to
b = (((a++)<0) ? -(a++) : (a++));
so that a is incremented twice, which is not the expected behaviour. To avoid these sort of problems, it is good practice to never use expressions with side-e ects2 as macro arguments. The final danger is that the ability to bypass the C type-checking system is a double-edged sword. It permits greater flexibility, but also prevents the compiler from catching some type-mismatch bugs.
In general, functions are to be preferred over macros as they are safer. However, with a little care, macros can be used without significant trouble, when required.
#define CLAMP(val,low,high) ((val)<(low) ? (low) : (val) > (high) ? (high) : (val))
#define ROUND(val) ((val)>0 ? (int)((val)+0.5) : -(int)(0.5-(val)))
The first, CLAMP, uses two nested ?: expressions to bound a value val so that if it is less-than low it becomes equal to low, and if it is greater-than high it becomes equal to high, otherwise it remains unchanged. The second macro, ROUND, rounds a floating-point value to the nearest integer. It performs this operation using the truncation properties of casting a double to an int, but contains one clever subtlety. The truncation by casting to int is straightforward if the value is positive, but machine dependent if the value is negative (see Section 2.7). ROUND gets around this problem by subtracting the negative value from 0.5, thus making a positive value, and then negating the answer.
Another clever macro trick is used to make macros behave more like functions. Consider the following macro that swaps two variables (using an additional temporary value).
#define SWAP(x,y,tmp) { tmp=x; x=y; y=tmp; }
This operation might be used as in the next example
int a=4, b=-1, temp;
SWAP(a, b, temp);
However, this macro will not behave in a function-like manner if used in an if-statement
if (a > b) SWAP(a, b, temp); /* Won’t compile */ else a = b;
as this code will be expanded to incorrect C syntax.
2 Side-e ects is a term used to describe expressions where the value of some variables are changed as a by-product of the expression evaluation. For example, a++ and b *= n, are expressions with side-e ects.
if (a > b) { temp=a; a=b; b=temp; }
;
else a = b;
A solution to this problem is to wrap the body of the macro in a do-while statement, which will consume the o ending semicolon.
#define SWAP(x,y,tmp) do { tmp=x; x=y; y=tmp; } while (0)
An alternative solution is to wrap the macro in an if-else statement.
#define SWAP(x,y,tmp) if(1) { tmp=x; x=y; y=tmp; } else
A variant of SWAP does away with defining an explicit temporary variable by simply passing the variable type to the macro.
#define SWAP(x,y,type) do { type tmp=x; x=y; y=tmp; } while (0)
This might be used as
SWAP(a, b, double);
Finally, a very tricky bitwise technique allows us to perform the swap operation without any tempo-rary variable at all. (However, this variant is only valid if x and y are integer variables of the same type.)
#define SWAP(x,y) do { x^=y; y^=x; x^=y; } while (0)
The C Preprocessor
The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character #. Lines that do not contain preprocessor commands are called lines of the program text...
The preprocessor removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands, such as ex-panding macro calls that occur within the source program text. The resulting preprocessed source text must then be a valid C program.
The syntax of preprocessor commands is completely independent of (though in some ways similar to) the syntax of the rest of the C language [HS95, page 39].
The C preprocessor performs a variety of text replacement operations on the source text before it is parsed by the C compiler. These operations include replacing symbolic names with constants, expansion of macros, and inclusion of header-files. A preprocessor directive, such as #define, has file scope, meaning that defined names are visible from their point of declaration until the end of the source file in which they appear.
The operations of the preprocessor are not subject to the syntactical rules of the C language, and so should be used sparingly and with care. In particular, the text substitution of macro expansion changes the lexical structure of the source code and may produce unexpected behaviour if certain pitfalls are not avoided.
10.1 File Inclusion
One of the most common uses of the preprocessor is for the inclusion of files into the source text. These are usually header files, but may be any text file. The file include command is
#include
and it is followed by a file name enclosed in either <> or "". The result of this command after preprocessing is as if the command were replaced by the text of the specified file. Further details with regard to the #include directive are discussed in Section 5.6.
10.2 Symbolic Constants
The preprocessor may be used to define symbolic constants via the #define command. The general form of a symbolic constant definition is as follows
#define name replacement text
whereby name becomes a placeholder for the character sequence replacement text. Given this command, the preprocessor will replace all subsequent occurrences of the token name in the source file by the sequence replacement text. Some example symbolic constants are
#define BUFFERSIZE 256
#define MIN_VALUE -32
#define PI 3.14159
A name created by #define can be undefined using the directive #undef. This name may then be used to represent a di erent sequence of replacement text.
Style Note. Symbolic constants are usually given uppercase names to di erentiate them from variable and function names. The #define command is also used to define macros, and these names are also usually uppercase.
C provides several di erent mechanisms for defining names for numerical constants. The prepro-cessor directive #define can create names for constants of any type; the type qualifier const can define constant variables of any type; and the keyword enum can define constants of integer type. Each of these is better suited to certain situations than the others. The #define command is most powerful, and can be used in any situation where the other two might be used, but it is also the most dangerous as the preprocessor does not respect C syntax rules. Variables qualified by const are generally preferred but, as these variables are not considered compile-time constant, they have one significant limitation; namely, a variable of type const int cannot be used to define the size of an array.1
#define ARRAYSIZE 10
const int ArraySize = 10;
double array1[ARRAYSIZE]; /* Valid. */
double array2[ArraySize]; /* Invalid. */
An enumeration constant does not su er this limitation and, between them, const and enum can satisfy all symbolic constant operations for which a #define might be used. As const and enum are part of the C language proper, and abide by its rules, they are to be preferred over #define in general. For example,
#define PI 3.14159
#define ARRAYSIZE 10
const double Pi = 3.14159; /* Preferred */
enum { ARRAYSIZE=10 }; /* Preferred */
10.3 Macros
Symbolic constants defined by the #define command are a simple form of macro: a symbolic name that is expanded into an expression via text substitution. The C preprocessor provides a more sophisticated type of macro definition by allowing the macro name to be followed by a set of arguments enclosed in parentheses. For example,
#define MAX(x,y) ((x)>(y) ? (x) : (y))
1 This is a historical anomaly and not a limitation of the ability of the compiler. The C++ language rectifies this oversight and, in C++, a const int may define an array size.
Although it looks like a function call, a macro behaves in a quite di erent manner. The preprocessor replaces the macro name with the defined replacement text, and substitutes the argument variables in the specified locations. Thus, MAX might be used as follows
int a=4, b= -7, c;
c = MAX(a,b);
and the preprocessor will expand the code to
int a=4, b= -7, c;
c = ((a)>(b) ? (a) : (b));
Note. In a macro definition, the parentheses immediately following the macro name must be directly adjacent to the name without whitespace. For example, the definition
#define MAX (x,y) ((x)>(y) ? (x) : (y))
is not equivalent to the previous macro definition, but will simply replace all occurrences of the name
MAX with the text: (x,y) ((x)>(y) ? (x) : (y)).
Macros are typically used for one of two reasons. The first is speed. Macros can perform function-like operations without the overhead of a function call because the code is expanded inline. With modern fast machines, using macros for speed is less important than it used to be. The second use of macros is to implement a kind of generic function. That is, to define a function-like expression that bypasses the C type constraints, and can be passed parameters of any type. For example, the macro MAX will work correctly if a, b, and c were type double, or any other type, where as an equivalent function
int max(int x, int y)
{
return x > y ? x : y;
}
will only accept integer parameters.
10.3.1 Macro Basics
Consider the following simple macros.
#define SQR(x) ((x)*(x))
#define SGN(x) (((x)<0) ? -1 : 1)
#define ABS(x) (((x)<0) ? -(x) : (x))
#define ISDIGIT(x) ((x) >= ’0’ && (x) <= ’9’)
#define NELEMS(array) (sizeof(array) / sizeof(array[0]))
SQR calculates the square of its argument, SGN calculates the sign of its argument, ABS converts its argument to an absolute value, ISDIGIT equals 1 if the argument value is between the character code for 0 and the character code for 9, and NELEMS computes the number of elements in an array.
Macros should be used with care. The preprocessor is a powerful but rather blunt instrument, and it is easy to use macros incorrectly. Macros are subject to three main dangers. The first is that the passed arguments may have surprising precedence after macro expansion. For example, if SQR were defined as
#define SQR(x) x * x
then the following expression
int a = 7;
b = SQR(a+1);
will expand to
b = a+1 * a+1; /* b equals 7 + 1*7 + 1 = 15, not the expected 64 */
For this reason, macro arguments should be heavily parenthesised as with the set of examples above. The second danger is that arguments with side-e ects may be evaluated multiple times after macro expansion. For example,
b = ABS(a++);
will expand to
b = (((a++)<0) ? -(a++) : (a++));
so that a is incremented twice, which is not the expected behaviour. To avoid these sort of problems, it is good practice to never use expressions with side-e ects2 as macro arguments. The final danger is that the ability to bypass the C type-checking system is a double-edged sword. It permits greater flexibility, but also prevents the compiler from catching some type-mismatch bugs.
In general, functions are to be preferred over macros as they are safer. However, with a little care, macros can be used without significant trouble, when required.
10.3 More Macros
There are many neat and ingenious macros to be found in existing source code, and there is much to be learned from other peoples invention. The following two examples are simple and clever.#define CLAMP(val,low,high) ((val)<(low) ? (low) : (val) > (high) ? (high) : (val))
#define ROUND(val) ((val)>0 ? (int)((val)+0.5) : -(int)(0.5-(val)))
The first, CLAMP, uses two nested ?: expressions to bound a value val so that if it is less-than low it becomes equal to low, and if it is greater-than high it becomes equal to high, otherwise it remains unchanged. The second macro, ROUND, rounds a floating-point value to the nearest integer. It performs this operation using the truncation properties of casting a double to an int, but contains one clever subtlety. The truncation by casting to int is straightforward if the value is positive, but machine dependent if the value is negative (see Section 2.7). ROUND gets around this problem by subtracting the negative value from 0.5, thus making a positive value, and then negating the answer.
Another clever macro trick is used to make macros behave more like functions. Consider the following macro that swaps two variables (using an additional temporary value).
#define SWAP(x,y,tmp) { tmp=x; x=y; y=tmp; }
This operation might be used as in the next example
int a=4, b=-1, temp;
SWAP(a, b, temp);
However, this macro will not behave in a function-like manner if used in an if-statement
if (a > b) SWAP(a, b, temp); /* Won’t compile */ else a = b;
as this code will be expanded to incorrect C syntax.
2 Side-e ects is a term used to describe expressions where the value of some variables are changed as a by-product of the expression evaluation. For example, a++ and b *= n, are expressions with side-e ects.
if (a > b) { temp=a; a=b; b=temp; }
;
else a = b;
A solution to this problem is to wrap the body of the macro in a do-while statement, which will consume the o ending semicolon.
#define SWAP(x,y,tmp) do { tmp=x; x=y; y=tmp; } while (0)
An alternative solution is to wrap the macro in an if-else statement.
#define SWAP(x,y,tmp) if(1) { tmp=x; x=y; y=tmp; } else
A variant of SWAP does away with defining an explicit temporary variable by simply passing the variable type to the macro.
#define SWAP(x,y,type) do { type tmp=x; x=y; y=tmp; } while (0)
This might be used as
SWAP(a, b, double);
Finally, a very tricky bitwise technique allows us to perform the swap operation without any tempo-rary variable at all. (However, this variant is only valid if x and y are integer variables of the same type.)
#define SWAP(x,y) do { x^=y; y^=x; x^=y; } while (0)
No comments:
Post a Comment