Friday, February 04, 2011

Can't We All Just Get Along, C++ and C?

Several times in my career I've found myself doing product development in large legacy code bases. By large I mean on the order of many millions of lines of code, representing software shared by many products across a broad product line. Successful product lines inevitably evolve over time to use different hardware targets, operating system platforms, and even programming languages.

One of the most common evolutionary paths I see is the transition from C to C++. C++ is a complex language; some say too complex. To quote C++ inventor Bjarne Stroustrup: "In C++ it's harder to shoot yourself in the foot, but when you do, you blow off your whole leg." But with some care and discipline it can be effectively used even in embedded systems, which can exploit the advanced performance and code generation capabilities that C++ brings to the table, while retaining the ability of C to work close to bare metal.

This evolution results in code bases that are a mixture of C and C++. Making them play well together is one of the paths to success in such an evolution. Using C code from C++ is business as usual; C++ developers routinely do this. Using C++ code from C is rarer, but it is not only possible, it can be a big win, allowing you to incrementally refactor C code into C++ without having to do a forklift upgrade on your entire code base. Here's how I do it.

(There are examples of everything I write about here in my Desperado library that implements design patterns I have found useful in fifteen years of developing embedded systems in C++.)

First, the basics that you may already know.

C and C++ have different linkage conventions, that is, different mechanisms through which the object code produced by each compiler makes functions calls, and with which the linker recognizes function names. C for the most part only understands C linkage conventions. C++ however can be directed to use linkage conventions different from the default C++ conventions. It does this by extending the extern keyword to take an optional argument: the language name. This new operator can be applied to a single C++ function prototype or function implementation, or to a block of them. When applied to a prototype, the operator tells the C++ compiler what linkage convention to use to call the function. When applied to an implementation, it tells the C++ compiler what linkage convention to use to be called. This mechanism can be used to allow C++ programs to call C functions, or C programs to call C++ functions.

Here are some examples that you might see in a C++ header file.

extern "C" size_t dump_bytes(Dump * that, const void * data, size_t length);

extern "C" {
void * heap_malloc(Heap * that, size_t size);
void heap_free(Heap * that, void * ptr);
}

This allows these C++ functions to be called by either C or C++. The C standard header files, for example stdio.h, routinely use this mechanism to make the C functions they declare available to C++ code. What's less well known is that your C++ compiler is allowed provide linkages for other languages as well. You can even specify "C++" as the language.

There are some limitations. C doesn't understand this new argument to extern, it's part of C++. You can't declare a C++ class or instance method to have C linkage, because C has no understanding of C++ classes, so you can only use this in freestanding C++ functions. C also has no understanding of C++ namespaces, so it only works on C++ functions in the global namespace. If you specify "C" as the linkage, you have to conform to C linkage conventions, for example you have to specify an empty parameter list as (void) instead of () just as you do with C. Finally, when you mix C and C++, your main program must be C++, otherwise the C++ runtime system will not be properly set up. (Your first clue that you botched this will be that your C++ static variables will mysteriously not be initialized.) We'll get past many of these issues in just a bit.

A header file can tell whether it is being compiled by a C or C++ compiler. The C++ compiler will automatically define the preprocessor symbol __cplusplus. (This isn't just part of the GNU C++ compiler, it is part of the ISO C++ standard.) So here is a code snippet illustrating a common pattern for a header file declaring C++ functions that is to be included in either C++ or C.

#if defined(__cplusplus)
extern "C" {
#endif

extern void * heap_malloc(Heap * that, size_t size);
extern void heap_free(Heap * that, void * ptr);

#if defined(__cplusplus)
}
#endif

A C++ program including this header file will understand that the functions have C linkage. A C program doing the same will just see the prototypes.

Here is another approach: Desperado has a cxxcapi.h (for C++/C Application Programming Interface) header file that defines preprocessor macros that simplify inoperability between C and C++. It has constructs like this.

#if defined(__cplusplus)
#define CXXCAPI extern "C"
#else
#define CXXCAPI extern
#endif

This allows you to do stuff like this in a header file destined to be included from either C or C++.

#include "cxxcapi.h"

CXXCAPI size_t dump_bytes(Dump * that, const void * data, size_t length);

Okay, that's the basics. Now here's the part that may make your head explode.

All this is well and good. But what you really want is a way to use C++ objects in a C program, not just C++ functions. You can.

In the examples above the types Dump and Heap are actually C++ classes, and the variable that represents a pointer to such an object in the C code, just as the built-in variable this does in a C++ program. Here's the funny thing about C that makes this more than just crazy talk. In C (and for that matter, in C++) you can declare a pointer to a type without actually defining the type. This is a form of forward reference: you are telling the compiler "I'll tell you what this type is later, just hang in there". As long as you never dereference the pointer in the C code or do something that requires the compiler to know something about the type, that is you never write *that or that->field or sizeof(Heap), you never have to define the type. The compiler is totally okay with this. It is so trusting.

typedef struct Heap Heap;
Heap * that;

This C code snippet declares but does not define a new structure named Heap, and a new type also named Heap for this structure. Then it declares that to be a pointer to an object of type Heap without saying exactly what a Heap is. You can pass the variable that around in your C code just as you please. The structure named Heap doesn't have to be defined unless you plan on dereferencing a pointer to it. It's an opaque type, but one that is type safe; its use takes advantage of all the C and C++ compile time type checking.

So, let's put all this together in a code snippet that would be part of a header file that could be included from either C or C++. It describes a C++ class that can be used from either language.

#if defined(__cplusplus)

class Heap {
public:
Heap();
virtual ~Heap();
virtual void * malloc(size_t size);
virtual void free(void * ptr);
};

extern "C" Heap * heap_new(void);
extern "C" void heap_delete(Heap * that);
extern "C" void * heap_malloc(Heap * that, size_t size);
extern "C" void heap_free(Heap * that, void * ptr);

#else

typedef struct Heap Heap;

extern Heap * heap_new(void);
extern void heap_delete(Heap * that);
extern void * heap_malloc(Heap * that, size_t size);
extern void heap_free(Heap * that, void * ptr);

#endif

This one snippet declares two different interfaces to the same Heap object: a C++ class-based interface, and a C function-based interface. Note that the C function-based interface can actually be used from C++. (This is especially useful if you are using a C++-based unit testing framework like Google Test, of which I am quite the fan.) But generally, your C++ code will use the C++ interface, and your C code will use the C interface. It is all in one header file, so it can be easily maintained in parallel.

I won't bore you with what the C++ class actually does; your imagination can fill in the details. The C functions could be implemented in the same C++ source file as the C++ class implementation. (In Desperado, I actually keep them separate so statically linked embedded C++ applications don't need to carry the extra baggage of the thin C implementation layer that they don't need and won't use.) We don't need any __cplusplus magic since it's all C++ code.

#include "Heap.h"

extern "C" Heap * heap_new(void) {
return new Heap;
}

extern "C" void heap_delete(Heap * that) {
delete that;
}

extern "C" void * heap_malloc(Heap * that, size_t size) {
return that->malloc(size);
}

extern "C" void heap_free(Heap * that, void * ptr) {
that->free(ptr);
}

Note that since this is a C++ source file, its Heap refers to the actual C++ Heap class. This is totally transparent to the C code that may carry around the address in its own pointer variable of type Heap. This C to C++ adaptation layer can take advantage of virtual functions and namespaces. It is full fledged C++ code.

So here's a snippet of a C application, compiled with the C compiler, that uses the C++ class.

#include "Heap.h"

Heap * heap;
void * data;

heap = heap_new();
data = heap_malloc(heap, 1024);
heap_free(heap, data);
heap_delete(heap);

This is especially cool when using virtual functions. If you have another C++ class that derives from Heap, say TraceableHeap, but its C interface is the same, you can just substitute your traceableheap_new() for heap_new() in your C code and rely on the existing Heap C interface to do all the heavy lifting.

With very little effort and minimal ongoing additional maintenance cost, you can use this technique to gradually introduce C++ into C code bases, and take advantage of not just C functions in C++ programs, but C++ classes in C programs.

(A big Thank You to Marshall Cline and the C++ FAQ from which I developed this technique many years ago.)

Update (2011-02-21)

You might be wondering if, instead of using an opaque type, you can define a complete C structure definition whose fields shadow those of the C++ class definition. That would allow you to access the C++ fields from C. The answer is: maybe; but it's really dangerous. That's because whether it works or not depends on how your C++ compiler implements objects.

For example, if you have virtual functions, the compiler will insert an invisible field known as the virtual pointer somewhere in your object. The virtual pointer points to a table of function pointers that point to the virtual functions for the class. C won't know squat about this. Some compilers place this pointer at the beginning of the object; others at the end. The ISO standard makes this an implementation issue.

Even if it worked, C could have no respect for the protected and private qualifiers in C++. Generally speaking: I recommend against it, for no other reasons than code portability.

The fact that your C++ compiler may place invisible fields in your object as part of its implementation is the same reason why you cannot safely apply memory operations from the C standard library, like memset and memcpy, to C++ objects. Doing so writes over these invisible fields. Specifically, they write over the virtual pointer. The result is (on a good day) your program crashes, or (on a bad day) it now behaves really weirdly because you've just caused the virtual pointer in a derived class object to now point to the virtual table of a base class object. Wackiness ensues.

No comments: