Writing an emulator

Randall Maas 6/1/2010 12:35:27 PM

I mentioned in the previous entry that I would describe how I wrote an emulation of an embedded system (and in the future, I intend to explain how I wrote simulations). I wanted to write the simulation for a number of reasons. To explore language semantics, to explore parsing, to explore ideas in testing, and debugging. And, to allow development of embedded systems when I didn't have all of the hardware ready.

The basic idea of this emulator is to take the source code intended for an embedded system, and run it using an conventional desktop environment like Visual C/C++ (in Visual Studio). To get it to run there a tool will be needed to make some practical modifications. Typically this emulation will not substitute the embedded system, and, running on a PC, it will not be able to simulator all of the microcontroller's, compiler, or peripheral quirks. But it does allow building and testing unitary pieces.

This break the overall approach down into the following parts:

  1. Transform code
  2. Patch tables to analysis shims
  3. Provide support code to emulate the microcontroller semantics
  4. Provide support code to emulate devices

What the transformed the code would look like

For a few reasons I decided to tweak the transformed code a bit. The basic idea is that my tool would rename procedures:

fromto
extern int16_t foo(...);
int16_t foo(...)
{
}
extern int16_t (*foo)(...);
int16_t Z_foo(...)
{
}

Notice that a prefix was added to the procedure name. (The Z_ was easy to use, short, and I did not have a better naming idea). Whenever a procedure calls foo it is actually calling something else. That something else is a shim procedure that does something useful.

This is done, in an extra module, that links the foo variable to the shim procedure:

int16_t (*foo)(...) = Y_foo;
...

int16_t Y_foo(...)
{
   int16_t Ret;
   TraceCall("foo", NULL);
   Ret = Z_foo(...);
   TraceReturn("foo");
   return Ret;
}

The shim code traces the call, calls the "real" routine, and traces the end of the routine. The call tracing could be a simple printf log, or something more complex. I was interested in printing out a sequence diagram, perhaps like in UML, to see what led up to a bug.

Actually, the shim code does not look like this. The shim code actually has checks for errors, which I will describe in the future. There are a few more reasons for doing translation with this kind of shims. I will explore those in the future...

Transforming the code

Let's look a bit more of the process of transforming the code. I kept restricted the simulation to an easy type of translation. If it was not easy, I might as well do something else. For a lot of reasons, C is a family of dialects, each having its own variation of grammar and semantic differences. Fortunately this is most often limited to keywords, with only slight semantic differences. Most often.

I relied on a few simplifying assumptions:

  1. I assumed that I would be working with C code that compiles in a normal usage. Basically, code that compiles with the compiler for the embedded processor.
  2. And I assume that there is a reasonable way to translate the code.

In my case, these are not high bars. First, I wrote the tool to work on source code I knew ' it has many regularities. (Code written to good coding standards should have a lot of regularity too.) Second, I often use lint to help catch syntactic and semantic errors with C. (I recommend PC-Lint. Strongly.)

The simplest translation approach is to use regex string substitution. Look for some sort of pattern, and then change just that, leaving the rest in tack. My translation tool primarily uses this approach.

A harder approach is parsing the code. This is not necessary to create an abstract syntax tree of the code. It is only needed to find out information, or act as a string matching and substitution, albeit more complex than regex. I needed to use parsing to find the names and types of variables, procedures (and their parameter lists). These are used to generate the shims.

Then the resulting code has to be generated. This includes renaming of the procedure names, and as well as those variable or register names. In many cases, it had to remove non-portable compiler and platform specific constructs. In a few cases, it had to introduce constructs to preserve the semantics of the target compiler or platform.

A look at some of the specific constructs that had to be papered over

Let's look at some of the specific constructs that had to tweaked, changed, or papered over. In the case of the compiler, I had to remove language structures that are useful on the target system:

What about the platform specifics?

I use configuration files to tell the translation tool how to change certain types, what procedures to use for certain registers, and what procedure it should use instead of procedure XYZ. (The latter are for procedures that cannot be translated automatically, or at least well enough to make it worth it).

The caveats

As serviceable as this approach is, it is a good idea to take not of limitations. The emulation is far from complete:

To be honest, the translation process is rickety. In some cases, it was motivation to improve the translator. In other cases, I considered it a good sign that the original source code should be written differently, perhaps mandating a style guide change as well.

Next time

Next time I will explain the portions of the emulation framework. And much later I'll explain how this can be used for testing.