The C preprocessor and Macros

We’ve already used the C preprocessor to include header files and write include guards. Here we will take a moment to look more specifically at the C preprocessor, and its directives.

The preprocessor runs on a file and prepares it for the compiler to act on. The preprocessor is (mostly) just a text substitution tool. Preprocessor directives issue commands to the preprocessor, and always start with a # symbol. There are 11 standard preprocessor directives split into a few categories.

Inclusion:

  • #include <somefile> finds somefile and pastes its contents in place of the include statement. When used with quotation marks, #include "somefile" the search path is limited to that specified in the compiler command.

Identifier definition:

  • #define SOMETHING defines the identifier SOMETHING. Additionally, providing a replacement as in #define SOMETHING replacement will overwrite every occurence of SOMETHING in the file with replacement.

    • We will look at this more below.

  • #undef SOMETHING clears the definition of the identifier SOMETHING.

Conditional statements:

  • #if, #ifdef, and #ifndef all start conditional statements. The latter two evaluate true (respectively false) if the given identifier has been defined. The first can evaluate more general statements.

  • #elif and #else create further conditional blocks if one of the former statements fail.

  • #endif terminates a conditional block, either given by one of the #if, directives above, or by the latest call to #elif or #else.

Other:

  • #error generates a compile time error.

  • #line resets the internal line counter, and optionally the internal representation of the filename (it will not change the name on disk).

  • #pragma issues special commands to the compiler.

The conditional directives operate by possibly removing bodies of text from the file before the compiler sees it. Any lines falling inside a conditional region where the statement evaluated to false will be ignored. This was precisely how we wrote include-guards before. You can find more information on this preprocessor reference page.

Preprocessor Macros

In addition to defining an identifier, the #define directive can perform text replacement on all parts of a file following it. In this usage, the definition is called a Macro. Consider the small example:

 1/* File: simpleDefine.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate the #define directive
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9#define THRESH 1.e-6
10
11int main(int argc, char **argv)
12{
13  double val = argc>1 ? atof(argv[1]) : 1.;
14  
15  if (val < THRESH) {
16    printf("Given value is smaller than threshold\n");
17  } else {
18    printf("Given value is larger than threshold\n");
19  }
20  
21  return 0;
22}

Download this code

Download these to the same directory. Then you can compile it by:

gcc simpleDefine.c -o simpleDefine.ex

Calling this program can give these outputs:

$ ./simpleDefine.ex 1.e-8
Given value is smaller than threshold
$ ./simpleDefine 2.
Given value is larger than threshold

Note in the above example that THRESH is never declared in the body of the program. Instead, the preprocessor replaces it by the literal value 1.e-6 everytime it is encountered. There can be more statements after the identifier, and in this case all will be inserted.

Macros can also behave in a similar manner to functions. Consider using the ternary operator to define a max function:

 1/* File: functionMacro.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate a function-like macro
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8
 9/* Slightly dangerous macro */
10#define MAX(a,b) ( ((a)>(b)) ? (a) : (b) )
11
12int main()
13{
14  double p = 1.5324,q = 3.7895;
15  printf("Max(%f,%f) = %f\n",p,q,MAX(p,q));
16
17  /* Contrived, but shows the point */
18  int s = 1,t = 2,ss = s++,tt = ++t;
19  printf("Max(%d,%d) = %d\n",ss,tt,MAX(s++,++t));
20  
21  return 0;
22}

Download this code

Download these to the same directory. Then you can compile it by:

gcc functionMacro.c -o functionMacro.ex

Let’s observe a few items about this code:

  • The entries a and b are preserved through the expansion of the macro, that is #define is a pretty smart textual replacement tool.

  • There may seem to be a lot of extra parentheses in the definition of the macro. These are there to allow the user to pass larger expressions into the macro.

    • Be careful though!! In executing the macro, expressions may get evaluated multiple times. Look at the second example

Remark: When writing macros that may have side-effects, like the MAX macro above, you should put them only in source files and never in header files. This will help prevent hard to track down bugs in the future. Better yet, you should avoid macros like this when possible.

Remark: Note that all of the identifiers used in the previous examples were given in all capital letters. This is to avoid the possibility of people using identifiers accidentally, then getting confused why the inevitable error is referring to a seemingly non-existant line.

Another fairly common example is to ensure a definition for \(\pi\), the existence of which can be inconsistent across platforms. It is common to see the snippet:

#ifndef M_PI
#define M_PI 3.141592653589793238
#endif

Of course, you could also evaluate \(acos(-1.)\) and store the value into an easily accessible double precision constant.

Predefined Macros

There are a number of predefined macros present in all translation units. Many predefined macros start and end with two underscores. (Does that seem familiar?) A couple of interesting ones are __FILE__ and __LINE__ which expand, as you may have guessed, to the filename and the line where they get processed. These can be quite useful for writing logging functions/macros when trying to profile or debug a program. There is also a special predefined variable (not a macro) in every function called __func__, which is often combined with these.

There are also predefined macros available by including some headers. For instance, the math.h header provides (among others) the macro HUGE_VAL which evaluates to the largest possible number of type double.

The other predefined macros exist so that your program can inquire about the capabilities of a given platform, and are generally not too useful in scientific computing.

However, you should note that defining your own macros to start with underscores is bad practice. This gives the possibility of colliding with either standard predefined macros, or macros used internally by the compiler.

Multi-line Macros

Occasionally you may want to write a macro that spans multiple lines. One example could be to write a simplified CPU timing macro to help you profile a piece of code. Another example could be to write a logging macro as mentioned above. Consider this code snippet that models the first example (use #include <time.h> before this):

clock_t curTime, diff; /* Special integer type for storing times */

/* Start timer */
curTime = clock();
/* Do the thing you want to time */
SomeExpensiveFunction(data,moreData);
/* Stop timer and report */
diff = curTime - clock();
int ms = 1000*diff/CLOCKS_PER_SEC; /* Constant imported from time.h */
printf("Timer ran for %d ms\n",ms);

Writing the three lines to finalize the timer can be a bit annoying. Lets wrap the start and stop behavior into a couple of macros. Consider this example:

 1/* File: timerMacro.c
 2 * Author: Ian May
 3 * Purpose: Demonstrate a function-like macro
 4 */
 5
 6#include <stdlib.h>
 7#include <stdio.h>
 8#include <time.h>
 9
10/* Timer macro and supporting variables */
11clock_t g_curTime, g_diffTime; /* Note that globals are needed, and the careful naming */
12int g_msTime;
13
14/* Start timer is pretty simple */
15#define START_TIMER() g_curTime = clock()
16
17/* End timer has to do more, note lack of semicolon on final line */
18#define STOP_TIMER() g_diffTime = clock() - g_curTime;  \
19  g_msTime = 1000*g_diffTime/CLOCKS_PER_SEC;            \
20  printf("Timer took %d ms\n",g_msTime)
21
22/* Define some long running function */
23double sumSeries(int numTerms)
24{
25  double sum = 0;
26  for (int n=1; n<=numTerms; n++) {
27    double x = (double) n;
28    sum += 1.0/(x*x);
29  }
30  return sum;
31}
32
33
34int main()
35{
36
37  START_TIMER();
38  double sum = sumSeries(10000000);
39  STOP_TIMER();
40  printf("Sum = %e\n",sum);
41  
42  return 0;
43}

Download this code

Then you can compile it by:

gcc timerMacro.c -o timerMacro.ex

Running this should report that the function took about 30 milliseconds. Observe:

  • The current time, time difference, and millisecond count are all globally defined

  • The final line of any macro is given without a semi-colon to make the calling location look coherent with the rest of the code

  • The separate lines in the STOP_TIMER macro are separated by backslashes. These act to collapse the macro into a single line, and are only present to help the readability of the code.

You should note that the above pattern for a multi-line macro suffers from major drawback, you can’t use this after an (un-bracketed) conditional statement. A common way to fix this is to wrap multi-line macros in a trivial do-while loop. The above macro could be changed to:

#define STOP_TIMER() do { \
  g_diffTime = clock() - g_curTime;                     \
  g_msTime = 1000*g_diffTime/CLOCKS_PER_SEC;            \
  printf("Timer took %d ms\n",g_msTime);                \
} while(0)

Note that the final semi-colon is still missing. This pattern ensures that the macro will not break after un-bracketed conditionals or similar statements.

Exercise: Re-write the above sample code to use profiling inside the function. Add the ability to turn profiling on and off, and wrap all multi-line macros in the do-while loop pattern. Additionally, modify the START_TIMER macro to report what function, from what file, is being profiled.

A word on typedef

We’ve also seen the keyword typedef which seems to have similar functionality to #define. The key difference is that typedef is handled by the compiler, not the preprocessor. Additionally, as its name suggests, typedef only applies to the definition of types, while #define is much more general.

That said, if you are defining a type you should absolutely use typedef as it removes many sources of error and helps the compiler give you better error messages.