Saturday, December 12, 2009

In Praise of do while (false)

By now I would have thought that everyone knew the joys of the C, C++, and Java language construct do while (false). You can find articles written about it on the web from as far back as 1994, which might as well be Neolithic cave drawings. Yet I continue to have questions about do while (false) (or do while (0) in C if you haven't defined false) from code inspectors who should know better. (You know who you are.)

There is nothing magic about do while (false). It does exactly what you think it does, which is to say, very little. In fact, it does so little, your typical optimizing compiler won’t generate any code for it. Yet, it is a really handy tool to keep in your toolbox, right next to your duct tape and vice grips.

Common Exit Flow of Control

All C++ functions have a common entry point. It is frequently desirable for functions to have a common exit point. There are all sorts of reasons for this. The most pragmatic reason is having a common entry and exit point makes it easy to add debugging statements that log the arguments being passed into the function, and the results generated by the function. If the flow of control needs to return prematurely, it can do so while not avoiding the logging statement at the common exit, just by doing a break.

bool function1(int argument1) {
int rc = 0;
printf(“%s[%d]: function1(%d)\n”,
__FILE__, __LINE__, argument1);
do {
// Some really complicated code.
if (bogus) {
rc = -1;
break;
}
// Some more really complicated code.
if (giveup) {
rc = -2;
break;
}
// Yet more really complicated code.
if (error) {
rc = -3;
break;
}
} while (false);
printf(“%s[%d]: function1=%d\n”,
__FILE__, __LINE__, rc);
return rc;
}

The inner logic uses the break statement to drop out the bottom of the do while (false). No need for labels and goto statements. No need for maintaining and checking flags. And if the inner logic completes and the flow of control finds itself at the while (false), it simply drops through. No harm, no foul, no iteration.

You can imagine replacing the logging statement with a close() system call, a free() call, a delete operator, or anything else you need to make sure you do to clean up after yourself. I routinely use this construct to eliminate any possibility of resource leaks in code that uses temporarily allocated resources like file descriptors, sockets, and dynamically acquired memory. I also routinely use it to fix resource leaks in legacy code, which seem to be surprisingly common in my experience, although using wonderful tools like valgrind during unit testing have helped a lot in that regard.

Adherents of other languages both more modern and more ancient will recognize this control structure as something you might have known as a do-end. It would be great if C++, C, and Java had something similar, perhaps a way to use break to exit out the bottom of any compound statement (that is, a block of statements enclosed in { curly braces } ). Alas, a break can only occur in the context of a switch or loop construct. So in order to use break, we must provide the compiler with a loop construct, albeit one that never actually loops.

This pattern is applicable to any block of logic, not just functions. I frequently use it when I am writing a long sequence of data transformations or functions calls, all of which must succeed for the result to be useful. If any step in the sequence fails, it does a break to the end of the block. Refactoring fans will be pleased that the pattern can be used to refactor spaghetti code into something more readable. The Design By Contract crowd will like the fact that code written with a common exit can establish preconditions above the do and postconditions below the while (false). Formal Verification folks will like the idea of establishing invariants (assertions that remain true during execution) before and after the do while (false). And although I find the idea of proving any non-trivial piece of code correct pretty laughable, I do find the concept of invariants to be very powerful when reasoning about program correctness, and the use of do while (false) helps me greatly in that regard.

The use of the break statement is obviously not universally applicable. If you are using it from inside another looping control structure, including other do while (false) constructs, or from inside a switch statement, then it is not going to drop to the bottom of the outer do while (false).

Instead, if you are using an ancient language like C, which I have a lot of affection for, the same way I might have had for Latin, had I studied Latin in high school instead of goofing off in the computer lab, you could have accomplished the same thing using a goto. In fact, this application is one of the few in which I find the use of goto acceptable. The maintainers of the Linux kernel use this pattern routinely, and so do I as my clients frequently call upon me to hack the 2.4 and 2.6 kernels to support their newest bleeding edge hardware platform.

But if you are using Java or any other C-like language that doesn’t have a goto, or if like me you find the use of goto a slippery slope, or even perhaps it is too reminiscent of those thousands of lines of FORTRAN IV that you wrote decades ago, the memory of which you are desperately trying to suppress, then this is a useful technique.

The use of do while (false) to implement a common exit flow of control is merely good practice. There is another context in which it is absolutely necessary.

Compound Statements and Preprocessor Macros

My name is Chip, and I use the C preprocessor when writing in C++. As much as the C++ purists like inline functions (and truth be told so do I), there are situations in which they just don’t cut it. I have used preprocessor macros to do fun things like computing the largest signed two’s complement binary number of any basic data type. I’ve tried to write an inline function to do that, and I would be pleased to see the results of anyone who did so successfully without using the preprocessor. (I’ve done it with a templated function, but then it could not be used in C.) The C preprocessor is a powerful form of code reuse known as code generation, and like all powers, with it comes responsibility. It must be used only for good and never for evil.

So given that I’m going to use the C preprocessor whether the C++ crowd likes it or not, consider the following code snippet.

#define TRANSFORM(_A_, _B_) \
function1(_A_); \
function2(_B_)

Now consider its use in this context.

if (transformable)
TRANSFORM(x, y);

It’s not going to do the right thing, is it? The preprocessor will expand it thusly.

if (transformable)
function1(x);
function2(y);

This is clearly not what the user of the macro intended. You might be able to make up a lot of excuses for writing macros like the one above, but regardless, you have done something to surprise anyone that uses it. You have designed an abstraction that does not conform to the behavior any competent programmer would expect. You can argue that your coding standard requires { curly braces } around even single statements in if else blocks. This is not going to be helpful to your fellow developer who has to port ten thousand lines of third-party code, code which follows its own coding standard, and wants to use your macro to make their job easier.

The logical thing is to place the function calls in a compound block instead.

#define TRANSFORM(_A_, _B_) \
{ \
function1(_A_); \
function2(_B_); \
}

Then our code snippet will expand into something like this.

if (transformable)
{
function1(x);
function2(y);
};

Looks better at first glance, doesn’t it? Now both functions are part of the conditional. Yes, there is a dangling semicolon at the end which is functionally a null statement. But so far, so good.

So try this.

if (transformable)
TRANSFORM(x, y);
else
TRANSFORM(q, r);

Now our snippet expands to something like this.

if (transformable)
{
function1(x);
function2(y);
};
else
{
function1(q);
function2(r);
};

This will not compile. The semicolon trailing the first invocation of the TRANSFORM macro is actually a null statement, separate from the compound block preceding it. It becomes a statement in-between the if clause and the else clause. Using a semicolon following the macro invocation in the expected way leaves the else clause dangling.

The fact that this code does not compile is the good news. The programmer using your macro will merely think that you are incompetent, and will never use your macro, nor probably any code that you write, ever again.

A much worse case would be if the resulting code compiled, but did the wrong thing. I have tried very hard to find a code snippet which compiles but does the wrong thing. I have been unsuccessful. I’m not saying that such a code snippet does not exist, merely that I am not smart enough to find it. If such a snippet exists, then the programmer using your macro will think that you are incompetent while they sit with a baseball bat in the bushes next your house waiting for you to come home. If we were truly judged by a jury of our peers, it would be completely justifiable homicide.

A common approach to fixing this is to use the macro without a semi-colon at the end.

if (transformable)
TRANSFORM(x, y)
else
TRANSFORM(q, r)

This is an unsatisfying solution. You are requiring the user to write code in an unexpected and surprising way. Worse, the requirement to omit the semi-colon is merely an artifact of having to use a compound statement. If a thousand years from now the definition of your macro changes so that it is not a compound statement, then you must churn every single application of it to add the semi-colon. Or you have to put the semi-colon in the macro definition itself, which may cause all sorts of wackiness to ensue. Wouldn’t it be better to just make the macro work like any other C++ statement?

Like Lassie, do while (false) comes to our rescue. What is it, girl? The barn is on fire? Timmy fell into the well? We write our macro thusly.

#define TRANSFORM(_A_, _B_) \
do { \
function1(_A_); \
function2(_B_); \
} while (false)

The preprocessor now expands our macro into a single C++ statement that must be properly terminated by a semicolon. Hence

if (transformable)
TRANSFORM(x, y);
else
TRANSFORM(q, r);

becomes

if (transformable)
do {
function1(x);
function2(y);
} while (false);
else
do {
function1(q);
function2(r);
} while (false);

The semicolon, added by the user of the macro, is now a required part of the syntax, not a dangling null statement.

All of the snippets I have shown not only compile, but do the expected thing when executed. The do while (false) control structure serves as a compound statement that is both syntactically and semantically well behaved.

One context in which do while (false) does not work is when you are using the preprocessor to generate code that declares variables.

#define ALLOCATE(_A_, _B_) \
do { \
int _A_; \
int _B_; \
} while (false)

The variables will be allocated on the stack then immediately deallocated when the do while (false) construct terminates. This is fine if the scope of the variables is limited to the code inside the do while (false). It is not so useful if they are being declared for use outside of that scope. The simple compound statement has the same flaw.

The Little Control Structure That Could

I hope I have given you a new appreciation of do while (false), the control structure that does so much, while generating so little.

2 comments:

Architecture Inspector said...

So, Chip, just:

#define BEGIN do {
#define END while(false);

and we've made a small step (forward) to a better language, Pascal.

Chip Overclock said...

My old friend and former colleague Professor David Hemmendinger reminds me why I don't teach Java:

"I saw that you had another bit on do-while (false),
in which you said, 'It would be great if C++, C, and Java had something similar, perhaps a way to use break to exit out the bottom of any compound statement (that is, a block of statements enclosed in { curly braces } ).' Java does have this in its labeled break; if a block has a label L, then break L; anywhere within it will exit the block to the first statement following it."

Thank you, David. It's always good to hear from you.