Monitoring the code execution

Randall Maas 6/4/2010 12:59:15 PM

Now that I have outlined how the emulation works, I can outline how I structured the emulation to monitor for proper execution, to catch bugs and other software errors.

Notice that I listed as the first pass of checking the code to be the warnings from lint and the compiler. A properly configured Lint is an excellent tool for catching a whole slew of bugs, especially improper use of other modules. This can be important when the modules are written by different fellows, some possibly quite unfamiliar with the rest of the system. Similarly, the compiler warnings should not be dismissed either. Often the compiler will produce different warnings (or suggestions) than what the embedded system does. And the Visual C compilers are including more and more features for detecting buffer overflows, range misused, unused variables, and other risks of execution. All of these are an immediate chance to squash latent bugs.

I also prefer to use the host's ability to detect bugs. For instance, the Visual C suite has usage checks on memory, and as a subset variables. It will trigger the debugger if a memory location (incl. variable) is being used as a read access but has not been initialized. It can detect buffer overruns, writes to NULL pointers, double-free() and use after free() (but you shouldn't be using dynamic memory allocation in embedded systems), and a variety of numerical overflows (although these checks often have to be enabled).

I also suggesting using the hosts profiling features.

The Shim

Before we look at the rest of the checks, we need to take a moment to look at the shim. The shim is an important point for checking the behaviour and performance of each procedure. It is the key part of where the action occurs. I outlined the use of these procedures, which wrap and call the real procedure, in an earlier entry. But I only mentioned that I had more reasons for using it than were provided for there.

In short, it lets me track the entry and exit to each procedure, without having to modify that procedure. Modifying those procedures is tricky, partly due to the variability of the language and issues with deep parsing of it. This approach is so simple. But modifying those procedures is something I didn't want to for a third reason. It would turn them into gobblity gook, and I wanted the procedures to remain faithful to their source when I have to debug the code, and port the changes back to the real source. (For instance, would have to handle how a procedure can have multiple points of return, which would plop all sorts of crud in at every potential return point.)

Let's look at the full template (more or less) of the shims:

ret_type Y_procedure(params)
{
   void* S;
   ret_type Ret;
   const char* PrevProcedure = Proc_Current;
   TraceCall		("ProcedureName", NULL);
   FSMs_update          ("ProcedureName", NULL);

   ...check parameters
   if (parameter checks fail)
     TraceBad("ProcedureName", "Error: parametername value out of range");

   S = SysState_save	();
   StackDepth++;
   if (StackDepth > MaxStackDepth)
     MaxStackDepth = StackDepth;

   Ret = Z_procedure();
   StackDepth -= 1;
   ... check the post-conditions'
   if (check conditions of return value)
     TraceBad("ProcedureName", "Error: return value out of range");

   SysState_check	(S);
   SysState_free	(S);
   TraceReturn		("ProcedureName");
   Proc_Current = PrevProcedure;
   return Ret;
}

Wow, that is a lot of stuff. What is it doing?

Performance tracking
maximum stack depth, procedure to procedure call transition counts, number of times being called, approximate duration of a call (note that the timing information gathered in this way does have wider error bars than would be on the real hardware)
Tracing
logging which procedure gets called, when, and their sequence
Boundary condition checking
Perform parameter and return value checking, processor / system variable checking
Sequence checking
order of calls, and when it can be called.

That's a lot. I'll begin discussing them here and will continue next time.

Profiling and tracing

Determining maximum call stack. This is the single most useful figure. Microcontrollers have a limited call stack. (I should explain that the stack of return addresses is typically in a separate stack from the local data on microcontrollers and some other architectures. This is very good.) Since they are limited, it is important to know what the worst-case usage would be so that we know it will not overflow. In effect, the worst case interrupts service routine stack usage plus the worst-case regular usage (ignoring the ISR). This helps approximate those figure.

I intended to use the trace information, with the other performance measurements, to develop more powerful profiling features. This turned out to be something I have not needed. I will outline this in a later post, as a possible future task.

Next time

The checking of the boundary conditions (including call parameters) and call sequence checking each deserves an entry to themself.