Thursday, May 31, 2012


When I followup on a past project about which I've written, I always face the quandary whether to update the original article or write a new one. Neither is a perfect solution. The former means folks who have bookmarked that article for future reference or folks finding the article for the first time via a search engine will see the new material. The latter means folks using an RSS feed will see the update. I tend to choose the former, updating the article, and if the update is separated from the original by a long time, I date stamp the new material so as to not confuse the elderly (like me). But to accommodate the folks who depend on their RSS reader, here's some articles I've recently updated.

The C++ Pointer to Member Operators: Based on feedback from readers, I added a short paragraph at the end to explain why one would want to use these operators. These operators, which really don't use pointers to an object but instead offsets into a class, are better than the most likely alternative.

Small Town Big City: I added a section at the end on running Amigo, my FreeRTOS-based interrupt-driven multitasking platform written in C++, to an Arduino Mega ADK with an Ethernet shield and a FreeRTOS EtherTen Uno-clone. Yes, I got a multitasking system running on the ATmega328p-based EtherTen with it's severe resource constraints. It wasn't pretty. I don't recommend it. Co-routines or state machines are likely to be a better solution; they are less scalable, and probably more expensive in the long-run to maintain, but scalability isn't likely to be an issue for the tiny ATmega328p with it's scant two kilobytes of SRAM.

Sunshine On My Arduino Makes Me Happy: I've documented several additional iterations on my solar powered Arduino Uno. The latest version uses a solar charge regulator and a sealed 12V gel cell battery in addition to the solar panel. I'm skeptical that my small battery and solar panel (I definitely went the inexpensive route) are sufficient to keep this system running indefinitely, particularly when the solar panel is just sitting in a south-facing window. But so far it's has run for the past nineteen hours, since about 14:00MDT yesterday. And Colorado has more sunny days than any other state in the Union. Sorry, California.

Tuesday, May 29, 2012

The C++ Pointer to Member Operators

Yesterday was the Memorial Day holiday in the U.S. and it turned out to be an unusually good day for me.

2001 Triumph Bonneville (Port)

I started out by taking an early morning ride on my Triumph Bonneville. I stopped for breakfast at one of my favorite coffee shops in Boulder Colorado, Caffe Sole, where I got a chapter read in Dixit and Nalebuff's popular introduction to basic game theory, Thinking Strategically [Norton, 1993]. I had lunch with Mrs. Overclock at a deli we'd never eaten at before while on our way to a hardware store that I'd never been to before (but she had), Harbor Freight, where I discovered an economically priced solar charge regulator for one of my projects. We finished the afternoon and evening at a cookout with some good friends of ours where I may have indulged in a few fine brewed adult beverages.

Somewhere amongst all those activities I discovered that I have not in fact memorized Stroustrup's epic reference book The C++ Programming Language [Addison-Wesley, 1997]. A big Thank You to Nishant in the group C++ Professionals for that little tidbit regarding the .* and ->* operators in C++. I was so surprised by this revelation that I had to write a little test program to convince myself how they worked. Here is the complete source code below annotated with comments. The remarkable part is in the second half of the program. I used both references and pointers, and both fields and methods, just to demonstrate to myself how all of it worked.

#include <cstdio>

class Foo {
    int field;
    void method() { printf("Foo@%p::method(): field=%d\n", this, field); }

int main()
    // We declare an object named bar of type foo.

    Foo bar;

    // We declare a reference to bar named barr
    // and a pointer to bar named barp.

    Foo &barr = bar;
    Foo * barp = &bar;

    // We use the usual mechanisms to set a field
    // of bar to 1 and to invoke a method of bar.

    barr.field = 1;

    // We declare a pointer to the field and a pointer
    // to the method in _any_ object of type Foo. Note
    // that this has absolutely _no_ mention of any
    // specific object of type of Foo, like for example
    // bar. Perhaps the implementation is storing the
    // offsets to the field and method in the variables.

    int Foo::*fieldp = &Foo::field;
    void (Foo::*methodp)() = &Foo::method;

    // We use the .* and ->* operators to set a field
    // of bar to 2 and invoke a method of bar.

    barr.*fieldp = 2;

    return 0;

And when you compile this little marvel for i686 using GNU C++ 4.4.3 and run it on Ubuntu GNU/Linux 2.6.32-40, here is what you get.

coverclock@silver:misc$ g++ -o pointer pointer.cpp
coverclock@silver:misc$ ./pointer
Foo@0xbff1841c::method(): field=1
Foo@0xbff1841c::method(): field=2

Stroustrup calls this Pointers to Data Members (section C.12, p. 853 in my hardbound Special Edition), although as he describes and as you can see above, it applies equally to fields or methods.

Why use this mechanism? Because an alternative is a generic pointer to any variable of the same type or any function with the same signature anywhere in the code base. These pointer-to-member operators are more type-safe in that they can apply only to a field of the specified type or method of the specified signature in an object of the specified class; it's a far more restrictive paradigm.

Friday, May 18, 2012

Peeking Under the Hood

Back in the day, if you asked a computer programmer (because that is what we were called way back then) what language they programmed in you'd get answers like FORTRAN, COBOL, and for the proud few maybe assembler. Today, when you ask that question, it's anybody's guess what the answer might be, but there's a good chance it would be a domain specific language I've never heard of.

Both then and now, if you asked a random software developer how did their language perform execution synchronization among multiple threads of control (or tasks, or processes, or execution contexts, or whatever damn thing their problem domain wants to call it), the typical answer would be "I dunno" or "isn't that the operating system's job?" or "what's a task?".

But most developers at least understand the basic problem in the abstract. The classic pedagogical example is your checking account. If two people are each independently trying to cash checks for $100 that you wrote when your account contains just $150, somehow the action of reading your balance to check for sufficient funds and then modifying your balance to withdraw the money has to be indivisible. Otherwise person A could check for sufficient funds, then person B could check, then person A gets their money, then person B, even though your account doesn't actually have of enough cash to cover both checks.

This wasn't a problem when there was only one bank, in one building, with one ledger book. The ledger was the final arbiter of how much money you had. But as soon as your bank had more than one teller, then more than one building, then maybe not even in the same city, suddenly the instantaneous balance of your checking account was a little more ambiguous than either your bank or your creditors were comfortable with. Even with modern communications, speed of light being what it is.

This classic pedagogical example is actually pretty crappy, because in real-life it happens all the time. Either the bank overdraws your account and then sends some leg breakers to your house, or in the case of electronic transfers, the transfer is held pending until the bank's central computer can figure out whether or not you're a deadbeat writing cold checks. In the former case the bank tries to fix the problem after the fact. In the latter case they're deferring the problem until they've had some time to ponder it. (This is also why you can order something online from but get email later saying "Oops, sorry, it's actually out of stock.") But in principle anyway, the example points out the issue in a context that is pretty easily understood.

So when developers understand this problem in the abstract (and most do), and if they've done some development in a language like Java that natively supports multithreading, their answer regarding execution synchronization might be "use the synchronize keyword". If they have used a POSIX thread library, it might be "use a mutex" (where "mutex" is used by those in the know as shorthand for mutual exclusion). If they know what SQL stands for, they might say "use a lock" or "a transaction". If they are really really old (but not as old as me) they might say "use a semaphore" or even "a monitor".

These are all software mechanisms that prevent more than one thread of control from executing a function or reading and modifying a variable simultaneously, or (as those in the know like to say) concurrently. In our canonical pedagogical example, each teller checking your bank balance was a separate thread of control operating concurrently.

But those answers just beg the question. When asked "how does the synchronize keyword (or semaphore, etc.) work?" typically we're back to "I dunno" or "it's the operating system's problem". If you keep asking "How?", peeling back layers and layers of software, down through the application, through the programming language, the library, the operating system, eventually you will discover that there is some mechanism down near the level of bare metal, often a single machine instruction, that actually does the heavy lifting of execution synchronization, insuring that two threads of control can't read and modify the same location location in memory by each reading it and then each modifying it, just like the checking account. All those other software synchronization mechanisms are ultimately built on top of this one low level hardware synchronization mechanism.

It was not always thus. I remember stories in the 1970s, when I first started having to worry about this stuff, about how the IBM mainframe development folks who were looking at the very earliest multiprocessor designs figured out they needed what generically became known as a test and set instruction. Test and set was a machine instruction that simultaneously remembered what the value of a memory location was and changed its value, and it did those two operations indivisibly, or (by those in the know) atomically. It was the machine instruction equivalent to the checking account example, checking the balance and deducting if funds were sufficient, except it was operating on just a byte or word of memory. This machine instruction was used to insure that, many layers of software above, code that was in fact checking your bank balance and deducting funds did so safely by allowing only one thread of control to do so at a time.

Eventually on the IBM mainframe the test and set instruction was replaced with the compare and swap instruction because it solved a larger class of execution synchronization problems. But to this day, decades later, machine instructions or their equivalents, on widely different processor architectures, used for implementing mutual exclusion operators like semaphores, are referred to as test and setcompare and swap, or something similar.

Compare and Exchange

(Above: Intel, "Intel 64 and IA-32 Architectures Software Developer's Manual", Volume 1, 325462-043US, May 2012, pages 7-6 through 7-7)

Test and Set, Compare and Swap

(Above: Freescale Semiconductor, "Programming Environments Manual for 32-bit Implementations of the PowerPC Architecture", Revision 3, MPCFPE32B, September 2005, page D-3)

Swap, Swap Byte

(Above: ARM, "ARM Architecture Reference Manual", ARM v7-A and ARM V7-R Edition, ID120611, 2011, page A8-432)

Recent versions of the GNU C compiler even have a built-in function called


that attempts to generate code for the underlying processor using whatever that hardware's native mechanism is at the machine code level, in a kind of homage to this decades old invention, now in a portable open source form.

Sometimes there is an otherwise innocuous machine instruction that just happens to be perfectly useful as test and set. For example, the DEC PDP-11 minicomputer had a decrement or dec instruction which atomically altered memory and set the condition code bits in the global hardware status register. You could set a memory location representing a lock to a value larger than zero to represent "unlocked", the value zero to represent "just locked", and a negative number to mean "already locked by someone else". You would use the dec instruction to decrement a word in memory and afterwards check the condition code by using a branch on condition instruction; if the result was zero, that meant the value had been positive before your decrement, hence you now own the resource that it represents; if the result was negative, someone had already locked it. You unlocked the memory location by using the increment or inc instruction. The value of the word in memory could be interpreted as a count of the number of resources available, so values larger than one could be used to implement a counting semaphore instead of just a mutex semaphore, representing, say, a pool of buffers. I have no idea if the PDP-11 hardware architects intended for inc and dec to be used this way, but it seems likely.

Sometimes there is no hardware execution synchronization mechanism at all. The IBM folks really only needed the test and set instruction when they started looking at multiprocessor systems. Before then, the only concurrency that could occur within a uniprocessor system was with an interrupt service routine  or ISR. A hardware signal interrupts the normal flow of instruction execution and control is automatically passed to a special function or ISR that  handles whatever the signal represents, like "a character is available because someone typed it on the keyboard". It is essentially a case of the hardware calling a subroutine. It is how input and output (and lots of other asynchronous and real-time stuff) is handled in most modern hardware architectures. This could result in the same kinds of concurrency issues if both the ISR and the code it interrupted read and modified the same variable in memory. In the uniprocessor cases, it was merely necessary to temporarily disable interrupts when the software knew it was in section of code in which it was critical that a sequence of operations, like reading and modifying a particular variable, be done atomically. Such a segment of code is called (those in the know again) a critical section.

This technique persists today in, for example on the tiny Atmel megaAVR microcontrollers I've been noodling with recently. Since they are uniprocessors, it is enough to disable interrupts going into a critical section using the cli machine instruction that clears the system-wide interrupt enable bit (the I-bit) in the system status register (SREG), and to reenable interrupts when exiting the critical section by using the sei machine instruction that sets the system-wide interrupt enable bit.

But that's not typically what megaAVR code does. If you have nested critical sections (easily done when you're calling subroutines from inside a critical section and they too have their own critical sections), you don't want to reenable interrupts when you exit the inner critical section because they weren't enabled when you first entered the inner critical section. You want to return the interrupt state to whatever it was before you entered your critical section, understanding that it might have already been disabled by whoever called you. So most megaAVR code (including mine) implements critical sections in C like this:

unsigned char temp = SREG;

/* Critical section code goes here. */

SREG = temp;

Entering the critical section, we read and store SREG (which is memory mapped, meaning hardware allows us to access it like a memory location; other processor architectures have a special machine instruction just to do this) containing the I-bit in a temporary variable, and then we disable interrupts by clearing the I-bit in SREG using cli(). Exiting the critical section, we don't enable interrupts by setting the I-bit using sei(). We instead restore SREG using the saved copy in our temporary variable, returning the I-bit to whatever its prior value was. It will have been a one (1) if the caller was not already in a critical section, zero (0) if it was.

So, are we good?

No, not really. You may have noticed that there's actually an issue here. If an ISR were to interrupt this code snippet after the SREG has been read, but before cli() disables interrupts -- which is entirely possible, because by definition we haven't disabled interrupts yet -- the ISR could change the value of SREG (there are after all seven other bits in SREG that mean other things). When we restore the value of SREG as we exit our critical section, we're restoring it to its value before we read it, not the value to which the ISR modified it after we read it. This is known as a race condition: two separate threads of control coming into conflict over the value of a shared variable because their use of it was not properly synchronized. Ironically, though, this race condition occurs in our synchronization mechanism, the very mechanism intended to prevent race conditions from occurring. (It's called a race condition because the outcome is timing dependent and hence non-deterministic: it depends on which thread of control wins the data race to change the variable.)

How does the megaAVR solve this problem? I don't think it does. In practice, this typically isn't an issue. I've never had a reason for an ISR to modify the value of SREG. Although I should point out that every ISR implicitly modifies SREG: the megaAVR hardware automatically clears the I-flag when entering an ISR, and the return from interrupt (reti) machine instruction automatically sets the I-flag when the ISR exits. This is so an ISR cannot itself be interrupted. Unlike other architectures (like my beloved PDP-11) this is not done by saving and restoring SREG on the stack, but by actually altering the I-bit in the SREG. This is possible because the ISR could not have been entered in the first place had the I-bit not already been set. But it still behooves the developer not to otherwise alter SREG inside the ISR lest wackiness ensue.

When you've spent enough time writing code close to bare metal like I have, you too will begin to look with skepticism at even the lowest level of operations of whatever processor you are using, asking yourself "how does this really work under the hood?" You will find yourself spending hours peering at eight hundred page processor reference manuals. I call that a good thing. Your mileage may vary.

Tuesday, May 15, 2012

Brain Versus Heart

Which is the more important, your brain or your heart? It's a stupid question, isn't it? You can't live without either one.

Which is more important, action or state? Broadly speaking, programming languages perform actions using executable code, and maintain state using variables. So which is more important? It's an equally stupid question, isn't it? You can't write software without either one.

Which is why OO languages like C++ and Java combine action and state into a single lexical component, the class, with some syntax and rules that force you, and the developers that maintain the code after you, to address both in tandem.

Can you do something similar in non-OO languages like C? You bet. If you implement the exact same functionality, the run-time overhead will be exactly the same in either language. But the developer-time overhead will be much greater in C because the developer is saddled with all the work that C++ will do automatically. Maybe that's okay.

Or maybe it's not. 

Reiterating what I wrote about recently, about two thirds of the life-cycle cost of a large code base is in its maintenance. Even if you could eliminate entirely the necessity of requirements, initial development, debugging, and testing, you would reduce your software development costs by only about a third. You'd still be saddled with corrective maintenance (fixing bugs), perfective maintenance (improving code, often for run-time performance or ease of developer-time maintenance), and adaptive maintenance (changing code to adapt to something that changed outside of the system, for example a new part because the prior one was discontinued by its manufacturer, or a new API for a system with which your code interfaces).

Old school organizations, like Bell Labs, and new fangled companies, like Google, have figured out that one way to reduce the long term costs of code bases is OO design and implementation. And part of that cost reduction is using a language that automates much of the cost reduction automatically. That's why both organizations, and lots of others, leverage the hell out of C++. 

That still doesn't make it easy. You can write crappy code in any language. It might even be easier to write crappy code in C++ than it is in C. So using C++ doesn't excuse the developer from knowing what they're doing. If anything, it increases the developer's responsibility not to be a dickhead. But in the long run, if developers know what they are doing, costs are reduced.

So it's about money. Hint: it's alway about money.

Again as I've said before, products that have a short life span, either by design or because they failed in the market place, won't face this issue. Only long term successful products will find themselves deep in the maintenance phase of their life cycle. 
Some organizations may find that they fail in the market place because their maintenance costs doom them. Their processes were not sustainable or scalable; they made short term cost reductions that inflated their long term costs with technical debt. And just like too much financial debt, it buried them.

Back in the 1970s, when I was getting paid for writing FORTRAN if you can believe it (because I am so old I am actually a brain in a jar connected up to wires and tubes), there was this new fangled idea called "structured programming". A lot of guys thought it would never catch on: the code was just too inefficient. But I liked it. Because, all on my own, through laborious trail and error, I had already discovered that writing something like structured code was the only way I could write code that I and others could understand, debug, and maintain. Languages like Pascal and C that supported structured programming were great because they automated what I was already doing by hand. 

In the 1980s, when I was getting paid for writing this new language called C, there was this new fangled idea called "objected orientation". A lot of guys thought it would never catch on: the code was too inefficient. But I liked it. Because, all on my own, through yet more painful learning experiences, I had already discovered that if I wrote header files that defined a structure and well-named functions that operated on that structure, it was the only way I could write code that I and others could understand, debug, and maintain. OO languages like C++, and later Java, were great because they automated what I was already doing by hand.

Which is why I'm a little surprised, here in the twenty-first century, that using OO and languages that support it for embedded development continues to be controversial.

I spent a good part of the late 1970s and early 1980s as a systems programmer writing code in assembler, for heaven's sake, for both IBM mainframes and PDP-11 minicomputers. Microcomputers made this problem even worse. I remember a colleague of mine once tiredly asking rhetorically "How many assembly languages do I have to learn before I win?" C was the higher-level structured assembly language we had all been waiting for, without knowing exactly what it would look like when we found it. I have no desire to go back to those bad old days.

The same emotion applies to my application of OO and C++ and other tools and techniques to embedded systems. I describe myself not as a software developer, but as a product developer, because the latter title describes better the full range of skills I bring to the table, the broadness of the work that I do in hardware-firmware-software debugging and integration, the range of talented technical folks with which I like to surround myself, and the issues that I worry about on a daily basis. For me to be successful as a product developer, I have to ship a product. And leveraging C++ in the embedded domain helps me do that.

Friday, May 04, 2012

The Code Also Rises

Can you write like Ernest Hemingway? Why not? He hardly used any big words. His sentences and paragraphs were short. His novel The Old Man and the Sea, published in 1951 and which I read nearly forty years ago, isn't even a hundred pages long. It was just about a old guy who went fishing. Why can't you write like Ernest Hemingway? How hard could it be?

Pretty damned hard, as it turns out. It's not about the act of writing words on a page. It's about knowing what words to write. Which is why Hemingway was awarded a Nobel Prize in Literature in 1954.

How hard can it be to write reliable, efficient, maintainable software? Take a look at the source code. It hardly has any big words. Each line isn't that long. A typical function takes up less than a screen. Some programs aren't even a hundred screens long. Why can't everyone write software? How hard could it be?

Pretty damned hard. It's not about the act of typing source code into an editor. It's about knowing what source code to type. And just like Hemingway, who was known for his spare, tight prose, knowing what source code not to type. A good software developer is no more a fungible commodity than a Nobel Prize winning author.

How hard can it be to manage? Take a look at a manager. He spends most of his time reading email, or sitting in meetings. Sometimes he wanders around and chats with people. He looks at spread sheets and documents. He goes to lunch. How hard could it be?

I think we both know where this is going.