The Win32 API
Is like a sharp poke in the eye.
Its macros obtrusive,
Definitions abusive,
And coupling? A grown man could cry!
Sounds like the solution is going to be very important and you should pay close attention? Well, before we ascertain the truth of that, it'd probably serve us all to look at the problem itself.
The problem is quite simple: when selecting between alternate definitions of functions, or types, by discriminating the presence or value of preprocessor symbols, the derived symbols are defined as macros, which pervades the entire compilation unit subsequent to the point of definition. Yikes! That sounds horrid, to be sure, but what exactly does it mean? Naturally, we'll illustrate with a code example.
Consider the following header file, AcmeThreadingStuff.h:
/* AcmeThreadingStuff.h */ ACMELIB_EXTERNC void TheFuncST(void); /* This does single-threaded stuff */ #ifdef ACMELIB_MULTI_THREADING_SUPPORTED ACMELIB_EXTERNC void TheFuncMT(void); /* This does multi-threaded stuff */ #endif /* ACMELIB_MULTI_THREADING_SUPPORTED */ ...
This all looks okay so far. Ignoring the meaning of ACMELIB_EXTERNC
for the moment, it's clear that there's a single-threaded version of TheFunc
, accessible in all builds, and a multi-threaded version that's accessible when AcmeLib determines that multi-threading constructs are supported by the target environment. Let's look further into the file:
/* AcmeThreadingStuff.h (continued) */ ... #ifdef ACMELIB_MULTI_THREADING # define TheFunc TheFuncMT #else /* ? ACMELIB_MULTI_THREADING */ # define TheFunc TheFuncST #endif /* ACMELIB_MULTI_THREADING */These five lines are provided as a convenience to the user, and select the appropriate version of TheFunc based on whether the current build settings are specifying a single-threaded, or a multi- threaded, compilation. (Note: For brevity we're assuming that the symbol
ACMELIB_MULTI_THREADING
cannot be defined in the absence of
ACMELIB_MULTI_THREADING_SUPPORTED
. Real-world header files would have a #error in there somewhere to enforce this assumption.)
Now the user of this library can write code without having to worry overtly about the threading model:
#include "AcmeThreadingStuff.h" int main() { TheFunc(); return 0; }
(Note: This is the real world, so please don't assume from this simple example that writing correct multi-threaded code is just about calling the right function. That's anything but the case, but we're talking about macros in this installment, so let's leave it at that for now, wink, wink.)
You may have noticed that we have not yet said whether the above code is compiled as C or as C++. As you're likely well aware, operating system and third-party libraries are generally packaged to provide C-APIs; for an exhaustive (exhausting?) discussion as to why, consult chapters 7 & 8 of Imperfect C++ [1]. One important reason is that C is the lingua franca of inter-language communication, since C modules can be directly linked to other languages including C++, D, Delphi, Visual Basic "Classic", assembler, Heron, and many others. C++ APIs can only be interfaced to C++ client code, and there are some serious restrictions even there [1]. Because C++ compilers mangle the names of functions in order to facilitate overloading, C functions visible in C++ compilation units must have the linkage specification extern "C"
applied to them. Hence, ACMELIB_EXTERNC
is defined to be extern "C"
in C++ compilation, and as extern (or as nothing at all) in C compilation units.
So, back to the code: what's the problem? Well, TheFunc
is a macro, which means it is seen, and put into effect, at all points in the compilation subsequent to its definition. Consider what happens when the program is enhanced to use some code from a C++ library from another vendor, BaRBSoft. BaRBSoft defines the interface to its library in BaRBSoftStuff.h
, and provides the implementation in a static library:
/* BaRBSoftStuff.h */ #include <string> namespace BaRBSoft { int TheFunc(char const *regId, int *regIndex); } // namespace BaRBSoft
We might change our main()
function as follows:
#include "BaRBSoftStuff.h" #include "AcmeThreadingStuff.h" int main() { TheFunc(); int regIndex; BaRBSoft::TheFunc("Billy Kriesel", ®Index); return 0; }Looks okay, does it not? Alas, this will not compile. The compiler will tell you that the namespace
BaRBSoft
does not contain a function called
TheFuncST
(or
TheFuncMT
, if you're building for multithreaded, i.e. if the symbol
ACMELIB_MULTI_THREADING
is defined). What gives?
BaRBSoft
and the function
BaRBSoft::TheFunc()
. That's the correct thing. Unfortunately, when AcmeThreadingStuff.h is subsequently included, it defines
TheFunc
to be
TheFuncST
(or
TheFuncMT
for multithreaded builds) for
the remainder of the compilation unit. So where you see
BaRBSoft::TheFunc()
in the body of
main()
, the compiler actually sees
BaRBSoft::TheFuncST()
. Not happy, Bjarne! (You won't have to study much of Bjarne's writings to discover his antipathy to macros, as in [
2, 3, 4]. Where the master leads, so shall we happy grasshoppers follow ...)
You might wonder whether this can be fixed by reversing the order of inclusion. Alas, that just shifts the problem.
#include "AcmeThreadingStuff.h" #include "BaRBSoftStuff.h" int main() { TheFunc(); int regIndex; BaRBSoft::TheFunc("Billy Kriesel", ®Index); return 0; }
Now the compiler is perfectly happy, but the linker gets the hump. The reason is that the declaration of BaRBSoft::TheFunc()
inside BaRBSoftStuff.h is translated by the preprocessor to BaRBSoft::TheFuncST()
. The same thing happens, as before, in the body of main()
, so the compiler sees both the definition and the use of the same symbol. However, because BaRBSoft are jealous guarders of their intellectual property, and supplied only a static library, containing BaRBSoft::TheFunc()
, against which to link, the linker fails to find BaRBSoft::TheFuncST()
.
So, whichever way you cut it, the #define
of TheFunc()
in AcmeThreadingStuff.h has trampled over our code, and broken it.
(For further reading on this issue�or many other important ones�we think it's worth pointing you to the latest in Herb Sutter's excellent Exceptional C++ series, Exceptional C++ Style [5]. Item 31 explains the problem.)
Several years ago, Matthew worked for a software company writing cross-platform software for network administration and statistical gathering. The software used its own messaging system, and one of the methods in the messaging API was called GetMessage()
. It all worked tickety- boo. Then they had to port their nice working system to Windows.
I'm sure you can guess the rest. Lots of compiler / linker problems complaining that SuperDuperNetworkMgr::GetMessageA()
could not be found. No doubt many of you are groaning in recognition of the problem, and have experienced first hand the Windows headers #definition of GetMessage
to either GetMessageA
or GetMessageW
, among myriad similar. Needless to say, this didn't endear the development team's Tandem/UNIX-heads to Windows.
They weren't in a position to sit back and pontificate on the abstract problem. A solution had to be found, and fast. The choices in this case were all unpleasant:
For reasons of both speed and "purity of soul", Option 1 was ruled out. Option 3 was the one selected, but the team subsequently "evolved" to Option 2.
You might think that, ugly as it is, this problem is at least discoverable at compile/link time. For the networking product at that stage of its development, that was so, and any of the three options above would yield "correctness", once compile and link stages were complete and error free. But consider what happens if you're using dynamic libraries, and are loading functions explicitly by name, via dlopen()/dlsym()
(UNIX) or LoadLibrary()/GetProcAddress()
(Windows). Just because the preprocessor will merrily change your GetMessage()
to GetMessageA()
does not mean it will also examine your string literals and do the same thing. Hence, you can have lurking problems in a code-base that was thoroughly tested and working on another operating environment, and such lurkers can be extremely hard to find. That is the case for any of the three options. (The only times such problems become easy to find are when you're doing a demonstration for your boss the day before he does your salary review, or when you've shipped the product to a client that has placed exacting downtime fines on your company. :-)
Clearly this problem is composed of two aspects, which combine to give the killer effect. There's the need to map one name to another, and also the potential wider (than intended) name correspondence on which the mapping may act. In principle, if either of these can be obviated, the problem goes away.
In C, the macro-preprocessor is all we have, and there's no alternative for providing the name mapping, so good authors of C libraries attempt to address the second aspect, the name correspondence. This is usually addressed by prefixing the names with an appropriately unique symbol, to give "safe(r) macros". For, example, Matthew's recls library [6] — implemented in C++, but presenting a C-API — uses the prefix Recls_
, as in Recls_CalcDirectorySize()
. While not being a theoretical guarantee, this technique usually suffices in practice.
One of the basic tenets of C++, as espoused by Bjarne Stroustrup himself [7], is that the preprocessor should be, at worst, relegated to the bench, and only brought onto the pitch when facing a particularly feisty opponent. Maybe we can follow that intent a little in this case?
Many years ago, Matthew used his one-good-idea-per-year quota and applied some common sense to the problem. As many of you will know, C++ compilers are required to define the preprocessor symbol __cplusplus
when processing a C++ compilation unit; in other words, when compiling a C++ source file. We can leverage this just as readily as we can the presence of UNICODE
, or ACMELIB_MULTI_THREADING
, or any other symbol, in order to know when we're in C or in C++. Remember, in C we must accept the status quo and merrily trample away. However, in C++ we have a better choice to macros, however unique we've attempted to make them: namespaces and inline functions.
(Note: C99 defines the inline
keyword for C code, and other compilers have proprietary extensions to do the same thing, so it's possible to take the C++ approach for C, as long as your compiler supports it.)
Let's look at how this might work in practice, by rewriting our AcmeThreadingStuff.h header:
/* AcmeThreadingStuff.h */ ACMELIB_EXTERNC void TheFuncST(void); /* This does single-threaded stuff */ #ifdef ACMELIB_MULTI_THREADING_SUPPORTED ACMELIB_EXTERNC void TheFuncMT(void); /* This does multi-threaded stuff */ #endif /* ACMELIB_MULTI_THREADING_SUPPORTED */ #ifdef __cplusplus # ifdef ACMELIB_MULTI_THREADING inline void TheFunc() { TheFuncMT(); } # else /* ? ACMELIB_MULTI_THREADING */ inline void TheFunc() { TheFuncST(); } # endif /* ACMELIB_MULTI_THREADING */ #else /* ? __cplusplus */ # ifdef ACMELIB_MULTI_THREADING # define TheFunc TheFuncMT # else /* ? ACMELIB_MULTI_THREADING */ # define TheFunc TheFuncST # endif /* ACMELIB_MULTI_THREADING */ #endif /* __cplusplus */
Now, in C++ compilation, there is no TheFunc preprocessor symbol definition, there is only the bona fide function TheFunc()
. This means that TheFunc()
no longer trespasses over other namespaces. In our mixed — AcmeLib + BaRBSoft — example, the symbol TheFunc
from the BarBSoft
namespace is now thoroughly unaffected by the definition of the AcmeLib version in the global namespace.
Indeed, a future evolution of the BaRBSoft library might result in a similarly conditionally defined nature to its TheFunc
, perhaps according to the ambient character encoding, as follows:
namespace BaRBSoft { #ifdef BARBSOFT_UNICODE typedef wchar_t char_type; #else /* ? BARBSOFT_UNICODE */ typedef char char_type; #endif /* BARBSOFT_UNICODE */ int TheFuncA(char const *regId, int *regIndex); int TheFuncW(wchar_t const *regId, int *regIndex); inline int TheFunc(char_type const *regId, int *regIndex) { #ifdef BARBSOFT_UNICODE return TheFuncW(regId, regIndex); #else /* ? BARBSOFT_UNICODE */ return TheFuncA(regId, regIndex); #endif /* BARBSOFT_UNICODE */ } } // namespace BaRBSoft
Because we've used inline functions, rather than macros, the name mapping in the BaRBSoft namespace does not leak out and pollute any other namespace, including the global namespace within which ACMELIB's TheFunc
is defined. Now we can kiss goodbye to compile errors, and missing symbols.
?:
) play just for extra confusion, as in the following macro/function:
int ProcessOddNums(int n); int ProcessEvenNums(int n); #ifdef __cplusplus inline int ProcessNums(int n) { return (n % 2) ? ProcessOddNums(n) : ProcessEvenNums(n); } #else /* ?__cplusplus */ # define ProcessNums(n) ((n % 2) ? ProcessOddNums(n) : ProcessEvenNums(n)) #endif /* __cplusplus */By defining the C++ form as an inline, we not only avoid stepping on any other
ProcessNums
symbol, we can also step into the
ProcessNums()
function, and avoid the often irritating/confusing mental disconnect one experiences when thinking one is about to step into one function, and ends up in another, leading to time-wasting existentialist digressions — how did I get here? It also significantly eases the business of fine-grained placement of breakpoints, and gives us a stack frame in which to play around and look at function parameters before diving down into the worker functions. This fact alone has proven its worth many times over in our use of the technique.
TheFunc
in C and C++. As you can imagine, such bugs are
extremely hard to find.
The alternative is to make a single point of definition, based on "safe(r) macros", and then define the user- facing macros/functions in terms of those macros, hence:
# ifdef ACMELIB_MULTI_THREADING # define ACMELIB_TheFunc TheFuncMT # else /* ? ACMELIB_MULTI_THREADING */ # define ACMELIB_TheFunc TheFuncST # endif /* ACMELIB_MULTI_THREADING */ /* The actual mappings for C / C++ */ #ifdef __cplusplus inline void TheFunc() { ACMELIB_TheFunc(); } #else /* ? __cplusplus */ # define TheFunc ACMELIB_TheFuncMT #endif /* __cplusplus */
There's a modest increase in code size and effort, but it's a manifest gain in robustness, and a final cherry on the maintainability cake.
The cost is a slight increase in admittedly arcane- looking code, but such things are readily amenable to being auto-generated by script. You can also choose to define them in a separate file that is #include
'd into your main, handwritten, header file, so you keep your main header nice and neat and comprehensible to users of your code.
Using this technique, you can avoid a host of troubles for yourself and, more importantly, for the users of your code. Now, if only we can get the large software companies to play ball�
Thank you for reading,
Bjorn Karlsson and Matthew Wilson
http://www.bigboyandrunningbear.com/
recls
is one of the exemplar libraries for Matthew's Positive Integration column for C/C++ User's Journal. The column is concerned with integrating different languages with C and C++, and uses different libraries to highlight the issues involved.Have an opinion? Readers have already posted 9 comments about this article. Why not add yours?
Bjorn Karlsson is proud to be a C++ designer, programmer, teacher, preacher, and student. He has finally learned enough about C++ to realize how little he knows. When not reading or writing articles, books, or code, he has the privilege to be a part of the Boost community, and a member of The C++ Source Advisory Board. His book, Beyond The C++ Standard Library: An Introduction to Boost, will be published by Addison-Wesley in 2005. He appreciates it when people send him interesting emails at bjorn.karlsson@readsoft.com.
Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of the
Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.
Artima provides consulting and training services to help you make the most of Scala, reactive
and functional programming, enterprise systems, big data, and testing.