Boundary Conditions

Randall Maas 6/6/2010 11:37:21 PM

In a previous entry I outlined the beginnings of how the emulator monitors for errors. In this entry and the next, I will discuss how to I specify the rules of proper code execution, and how the specification is changed into mechanical checks. This time I will discuss how to check boundary conditions, with an approach that is very similar to design by contract.

In the abstract, this method works by dividing the software up to a number of regions, and then listing various stylized (that is, simplistic) rules on how things should be in each region. The three kinds of regions are global variables, structures (comprised of fields), and procedures (comprised of parameter and return values).

File format

I will quickly describe the file format for the boundary condition. This is a good way to see what is going on. The file has four sections:

Constants. This section is for convenience. It is simply a list of text names and their associated literal (numerical) value. The intent is to ease maintenance on the rest of the file, and to add some clarity to the other rules.
Structure Field Ranges. This allows specifying the rules about values that each field may have.
Global Variable Ranges. This allows specifying rules about values that each global variable may have.
Procedure Parameter Ranges. This allows specify rules about the procedures parameter values and return values.

Let's look at the section defining Procedure Parameter Ranges in more detail. (The other sections are similar). Below is an example:

procedureFoo(param1,param2)
   0 != param1
   return > param1

There are three elements to each body of procedure rules. There is the procedure name, which followed by a list of its parameters. You can use any name you want for each parameter. (The reason for specifying them here is that a procedures declarations and definition often have different names for parameters. None can be counted as definitive.) The parameters do not have their type specified here.

Finally, this declaration is followed by the actual conditions, an indented list of comparisons. This layout is meant to be similar to the "such that" clauses in linear programming (aka constrained optimization). In addition, the conditions are intended to catch out of range conditions.

The comparison can be one of > >= == != <= or <. In the general case, the expressions for either side of the comparison are arithmetic expressions that may involve variables, literals (i.e., numbers), named constants or the return value (identified as return). The variables that can be referred to are the listed parameters and global variables.

There general case is, well, general, and should be used sparingly. Instead, there is a special, preferred style for comparison expressions. The expressions on the left hand of the comparison contains return, a parameter name, or a global variable name. The expressions on the right can contain literal values and named constants.

There are two reasons for this preference. First, rules in this form are exported into PC-Lint rules (i.e., the "semantic" conditions), which lets PC-Lint do interesting forms of analysis and help developers catch their bugs earlier. The second reason is that it is used to create a suite of tests, a process I will discuss in the future.

Note: The rules for the globals and structures are listed in a similar fashion, without parameters.

Note: Actually I lied. There is another kind of clause, other than these comparisons. More on this kind of clause and its semantics in a moment.

Similarities to Design By Contract.

It probably is not a great surprise that these conditions are going to be mapped into the equivalent of C's "asserts," or elements of "Design By Contract".

The boundary conditions can be seen as pre-conditions, post-conditions, and invariants:

An expression that refers only to the parameter values and globals, but does not use the "prev" clause, is a pre-condition.
Any expression that refers to the return value, or uses the "Prev" clause is a post-condition. It may refer to parameter values, as well as global variables.
An expression that refers only to global variables is an invariant or a post condition.

(Inner conditions might be crudely done in the code with "asserts.")

This gives a pretty good indication how the rules will be consumed the code transformer and where it will place them as checks. After all of the conditions have been read, and (lightly) analyzed, they are grouped. The "invariants" emitted as part of the Sys_check() procedure, which is called at the end of the shim routines. The pre- and post-conditions are gathered on a per procedure basis, and placed within the shim routine, before and after the call to the target routine.

As anyone who has used pre- and post-condition checking knows, these can be pretty useful. But there is a twist that allows much more interesting post-conditions.

Modal Logic clauses

I mentioned earlier that there is another type of clause other than simple comparisons. It is different because it allows a pseudo-function (called Prev()) and && conjunctions of comparison expressions. It is easiest to explain these with a couple of useful examples.

Let's say that I'm working with a circular buffer. The following constraints will catch when put too much into a buffer, causing it to overflow:

Checking for buffer overrun

   if Prev(Head) != Prev(Tail) &&  Head  == Tail && Tail != Prev(Tail)  error ("Buffer overflow")

There was stuff in the buffer, we added something (but didn't remove anything) and now the buffer is empty. Ooops.

And we can check for taking too much out of the buffer, causing it to underflow:

Checking for buffer underrun

   if Prev(Head) == Prev(Tail) && Head  != Tail &&  Head  !=  Prev(Head) error ("Buffer underflow")

The Prev() provides the value of Head before the routine modified it, while Head is the value after the routine modified. In this case, the buffer was empty, we removed something (!), and now the buffer appears to have contents. This is two bugs since the buffer probably holds out of date junk.

In the past, all of the rules specified only one condition. If the condition is violated, it triggers the debugger. But these requires more than one condition, and if all of them do happen (rather than be violated), an error is triggered. That is the outright reverse of the earlier boundary conditions.

However, I found it easier to specify these kinds of checks in this way. These cases are patterns of specific bad things to look for, and it needs to be as simple as possible to express them. Nerds may recognize these boundary conditions in a different form. Take all of the boundary conditions listed earlier, and negate them. Add these modal checks onto the list. That is, make a big if check like:

   if (!Rule1 || !Rule2 || ...)   error
   else if (modal rule1) error1
   else if (modal rule2) error2
...

What we are looking at is conjunctive-normal form, aka Horn clauses, to "prove" the existence of an Error.

Saving state as the basis for `Prev()`

How the Prev() is an interesting operation. It lets us concern ourselves with stepwise changes in values, and how a procedure should or should not change a value. Let's look at how it implemented.

The short version is that the state of all the global variables is saved at the start of the shim, before the target routine is called. (The parameter values are saved as well, but they have no change with time here). Prev() is simply SavedState->NamedVariabe and is used after the target routine has been called.

This only leaves the question of how the state is saved.

As part of the code translation, the translator scans thru and finds all of the global variables, as well as the named registers. Then it generates a struct, which declares each. This structures holds the microcontroller register, the target C run-time's variables, and the global variables from the embedded software, such. Then the translator generates a procedure that allocates such a struct and copies each of the global variables to it, one at a time.

Notes

1. What this does not do. This does not any provision for substantial structural checks. This would be something like traversing data structures for consistency, and the like. Conceivably a person could add some to the framework, and they could be called at each return point. Such a test would scan program structures in ram, check structures within EEPROM, external memory devices. I think this is would be quite valuable, but it is not trivial to express.

2. The techniques outlined here are far from full "Design By Contract." We are not specify what the algorithm should do, what you can expect about it, what you can say about the parameters meaning, the meaning of the results, and so forth.

3. Nor do I consider this sufficiently similar to Dijkstra's Guarded statements. Mainly because the Guarded Command Language allowed much finer granularity on what may have a condition. Any statement could have a condition on it, and specify what should happen if the condition was not met.

Next time

Next time I will discuss how calling sequences are enforced.