Protecting calling sequences

Randall Maas 6/8/2010 12:18:42 AM

Where we are? I have been describing part of an emulator that acts as a bug sieve. It helps automatically identify when it is likely there is a bug manifesting itself. In the previous entries I described that the emulation checks that we are following rules for using a peripheral device, that parameters to a call are in a valid range, that variables are within their constraints, and that the output values are in a valid range. This time I am going to explain how the emulation checks that the program is making acceptable sequences of procedure calls.

Call sequences are Regular Expressions

The heart is checking that the software is using acceptable calling patterns. In a few papers, Thomas Ball introduced (or at least explained) that almost all meaningful calling sequences can be expressed as regular expressions. (Thomas Ball, "Programs Follow Paths" 1999, MSR-TR-99-01).

Let's start with an example that Ball uses in 2004 presentation. The convention is that a lock must always be unlocked before you can lock it again, and it can be locked only once per lock. This is has the regular expression:

(Lock Unlock)*

Bear in that this only specifies what calls have to be made, and in what order. It doesn't specify what the values are to a call. (Those could be bogus, and the calling rules here wouldn't notice.)

In his presentation, he provides the following code, to ask the audience: Is the locking correct?

do {
    Lock(); 
    nPacketsOld = nPackets;
    if (request){
          request = request->Next;
          Unlock();
          nPackets++;
    }
} while (nPackets != nPacketsOld); 
Unlock();

He suggests the following simplification to help understand the code:

do {
    Lock(); 
 
    if(*){ 
          Unlock();
    }
} while (*); 
Unlock();

Static analysis of code frequently employs this type of simplification; it retains the items under question, the control flow structures (marking their selector with a strike *), but removes all other data flow elements (variables, expressions, etc.), declarations, and the like.

However, I do not have a static analyzer that performs this on the embedded code I have. But I wanted to come up with something to implement this check. I decide a reasonable method is to have these calling pattern checks performed dynamically, while the program is running.

A regular expression engine

Let me give a real example for an API that is more complex than the simple lock pairing given earlier:

 (I2C_Init ( I2C_Start I2C_Send+ (I2C_RepeatedStart I2C_Send? I2C_Recv*)? I2C_Stop )* )+

This is an API for an I2C module. Quite complicated, no? I wanted a regular expression tool that was simple enough for me modify, but could still match the regex operations above, plus a few other common ones (such as alternation).

I started with Russell Cox's DFA ("Regular Expression Matching can be fast and simple," January 2007). It was not selected for any particular reason other than it works, and supported the required regex operations.

I modified this code to allow lots of regex recognizer's to be running simultaneously. And I modified it to automatically wrap all regular expressions in a "( )*" operation, to automatically reset then for the next round API calls after the first is completed. I created a module to read in a set of regular expressions, and create one recognizer per expression.

The next step is updating the states of recognizers as calls are made. When the expression recognizers are being created, I make a table that maps each specified procedure name to the regex recognizer that concerns it. (This also means that a procedure can be specified in only one regular expression). In the shim, before a call, the name of the procedure being called is mapped to the recognizer (if any), and passes the recognizer is passed the token. If the recognizer says that the string of calls is not recognized, then we have an error, and the debugger is triggered.

Eh, what about multiple instances?

What about things like locks, files, objects, et cetera where there can be more than one, and the calls made with respect to these don't interact? In those, case the sequences of:

open (read |write)* close

are not meaningful, because they'll trigger the debugger too often.

I did create a simple solution, at least in the framework. For these items, the regex recognizers are duplicated when they are created. Then there is a mapping of the instance id (whatever that is), and the procedure name to their own recognizers. As I said, I put this into the framework, but I haven't really used it. Most microcontroller code doesn't have many instance-based selections, so it did not come up that much. (But, since this is the norm in software, I do expect to come up eventually.)

Eventually and Modal statements

Where the boundary conditions allowed a simple check against the past, regex's allow a specification of what must happen in the future.

At first glance, the regex appear to be just rules that must be followed, not productions of the future. The way to look at this is once you have done a lock it says that unlock must eventually be called. (In model checking tools, there is strange beast called "linear temporal logic" that has an operation called "eventually" to describe what you can expect to happen in the future.)

Normally the regex checks let sleeping dogs lie, as it were. They all are waiting for the next procedure call, and don't complain (on their own) if it doesn't come. To fix this I created a procedure that checks that all FSMs are in a starting state and not left hanging. It is called at exit time, but it can be called at any other time when we're sure that all the other calls should have been made by (we reach the "eventually" point).

Since the typical main() in a micro controller is an infinite loop, one possibility is to insert this check at the bottom of the loop. I found that this is not an appropriate place in general (too often tasks are partially complete, progressing with each cycle thru the loop), so I have not made it automatic.

Where to place this check remains an open problem.

Other esoterics

C is a string generation specification

Nerds may have noted that I have basically claimed that source code is a specification for generating strings of calls. And that we want to check those strings. Can we show a string generation specification is always matched by a regular expression or other grammar? For software this is called "temporal safety."

Basically, it is easy for only very simple languages. It is very, very hard when we're talking about general languages like C.

Countable and Nestable locks

Things like (Lock Unlock)* do not work with countable locks. To do so we would need something more than regex for some interfaces. Full context-free class would work, but isn't necessary. There are perhaps even less common than object instances in microcontrollers. They are common in embedded systems and other software to be sure. But the issue just hasn't proven to be that important.

Next time

Next I'll discuss how some of this can be used to create (automated) unit tests.