A few years ago I was in a fun discussion with one of our junior firmware engineers about the different constructs in C and how these constructs mapped to underlying machine code, and performance characteristics. During the conversation came up that he didn't know the switch
, case
,break
construct works in C, and how it makes Duff's device possible. I'll briefly describe it here.
The basic C switch block looks something like:
switch (ValueExpression)
{
default:
do something;
break;
case 23: // Fall thru
case 27:
do something
break;
}
A naive C compiler splits this into two parts. The switch
and case
statements are turned into an if then else
tree. The body - all the bits that do something ' are placed into a second code block after the tree. The two are linked by goto
and labels. And, finally, the break
's are turned into goto
out of the block:
if (ValueExpression == 23)
goto L_23;
else if (ValueExpression == 27)
goto L_27;
else
goto L_default;
goto L_done;
L_default:
do something;
goto L_done;
L_23: // Fall thru
L_27:
do something;
goto L_done;
L_done:
The if then else
tree can be made a lot faster, with better compilers. If the values are densely grouped, a compiler could use the value as an index into a jump table. It could use a binary search of the values, either by careful arrangement of the tree or using a table and a special operation to scan the table. Java (atleast the last VM spec I read) preferred that approach. Or a hash based look up could be use. Of course, these techniques could (and probably should) be used in combination.
In the end, I don't know how many compilers do these. If a compiler optimizes at all, it is more likely to be able to optimize the switch case. Recognizing patterns in an if then else
tree are possible, but difficult enough that it seems a lot of compilers put it off.
As a historical side note, I heard that switch
was added to provide computed goto behaviour, without computed goto.
For those that aren't familiar Duff's device is a technique that is used to performloop unrolling_. Loop unrolling is used when you have a loop with an inner block that is (usually) so small that the loop's increment, test and branch consume a lot (ie more than say 10%) of the time. The idea is just repeat the inner block a few times, and divide the loop counter. Of course you need to add a few extra cycles of this inner block to make sure you perform its actions the proper amount of time.
That is, it would take a loop like:
for (int I = 0; I < Max; I++)
{
some action
}
Duff's device puts the loop _inside of the switch
statement, using the switch
to balance out the number of times the inner block is called:
switch(Max % 4)
{
for (int I = 0; I < Max/4 - 1; I++)
{
case 0:
some action
case 3:
some action
case 2:
some action
case 1:
some action
}
}
This would in turn get translated into:
if ((Max%4) == 0)
goto L_0;
else if ((Max%4) == 1)
goto L_1;
else if ((Max%4) == 2)
goto L_2;
else if ((Max%4) == 3)
goto L_3;
else
goto L_done;
for (int I = 0; I < Max/4 - 1; I++)
{
L_0:
some action
L_3:
some action
L_2:
some action
L_1:
some action
}
L_done:
As with the discussion of the switch
the compiler's optimizer could automatically insert a Duff's device for loop unrolling - if it knew that it should do this. Indeed many optimizers do. Just something to look at when reading the assembly output, when you're wondering why a small loop became a largish chunk of code.