Avoid C Preprocessor Abuse

C and C++ developers have made common practice of using inline preprocessor directives to select pieces of code in a module to be compiled in or out depending on certain defined symbols. This is especially common in cross-platform code, where it is often necessary to change the implementation of a module depending on which platform is being targeted.

In the worst cases, this technique is abused to litter the source code with numerous pseudo-functions and pseudo-interfaces. The code casually mixes multiple, mutually exclusive implementations together. Here's an example from Qt 4.6.1 qsystemsemaphore_p.h :

#ifdef Q_OS_WIN
    HANDLE handle(QSystemSemaphore::AccessMode mode = QSystemSemaphore::Open);
    void setErrorString(const QString &function);
#elif defined(Q_OS_SYMBIAN)
    int handle(QSystemSemaphore::AccessMode mode = QSystemSemaphore::Open);
    void setErrorString(const QString &function,int err = 0);
#else
    key_t handle(QSystemSemaphore::AccessMode mode = QSystemSemaphore::Open);
    void setErrorString(const QString &function);
#endif

Using the preprocessor in an arbitrary manner is a sign of poor design. (In the example shown, a private interface is changing across platforms.) There are several issues with this technique:

  1. The compiler doesn't see preprocessor directives. In C and C++ they are not a proper language feature but rather a layer on top of the language. By the time the compiler actually receives the source, all such directives and symbols have been fully evaluated. This means that the compiler can tell us little to nothing when a mistake, whether syntactic or semantic, is made with a preprocessor directive. Often the result is either a bug in the program or a compiler message which is inconsistent gibberish when compared against the original source. Great confusion emerges as line numbers shift, symbols appear and disappear, and statements are replaced. Naturally, this is a debugging nightmare.

  2. It merges interface and implementation. The Qt code cited here is a good example of how that can go wrong. In this case, the actual function signatures change based upon which OS we build for. Most examples have more subtle flaws.

  3. Cluttering the source code of a module in random places with preprocessor #define, #if, #ifdef, and so forth makes it more difficult to read and understand. For a given use case, a module should have one clearly distinct implementation; not multiple overlapping implementations which readers are forced to manually separate in their mind.

  4. Expanding support for new cases or platforms requires sifting through most or all of the code base with a fine tooth comb for locations that might need a new check, or have new clauses added to existing checks.

In case there is still doubt on the merits, ask yourself this: have successful programming languages created after the mid-1980s, provided a preprocessor or analogous mechanism?

Quick run-down:

[*] Applies to Turbo Pascal, Free Pascal, and Delphi.

Very few modern programming languages have this feature, and those that have something similar use an integrated design managed within the compiler or a low-level runtime instead of as a separate frontend tool. Significantly, this nullifies the problem of incomprehensible debugging, as the compiler can see and validate conditional directives much as it would any other language construct. Moreover, newer languages place constraints the C preprocessor does not. This encourages good design and code simplicity.

One doesn't need to switch languages to eliminate these design flaws, however. Instead, reorganize the design so that the conditional code sits behind a well-defined interface. In C, such an interface will be composed of one or more header files. Create an implementation of that interface for each usefully distinct configuration (for example, each platform). Each of these will be an independent set of C source files. Finally, setup the build system to compile and link only modules relevant to the current target.

There are other potential solutions. The most obvious is to eliminate the behaviorial or functional differences which created the context-dependent code to begin with. This is not always sensible, however it is the best option when available as it simplifies the code base. Another alternative is to find an existing library which solves the same essential issues, and outsource the problem to that library instead. When feasible, this helps eliminate much unnecessary work and can even reduce the overall complexity of the software ecosystem by solving the fundamental issues exactly once instead of many separate times.

A piece of key understanding here is that C as a language was hacked together on the fly as a replacement for assembly, during a time when high level software design was still in its infancy. The first reasonably complete description of the language, K&R, was not released until 1978. It took another eleven years for the first formal specification, ANSI C89. Thus, even as computer science was under rapid development, the language feature set was largely fixed in the past. At this time, it is unlikely that a proper system of modules and interfaces will ever be introduced. It is, therefore, incumbent on developers to use the tools they do have to imitate them.