Writing an emulator

Randall Maas 6/1/2010 12:35:27 PM

I mentioned in the previous entry that I would describe how I wrote an emulation of an embedded system (and in the future, I intend to explain how I wrote simulations). I wanted to write the simulation for a number of reasons. To explore language semantics, to explore parsing, to explore ideas in testing, and debugging. And, to allow development of embedded systems when I didn't have all of the hardware ready.

The basic idea of this emulator is to take the source code intended for an embedded system, and run it using an conventional desktop environment like Visual C/C++ (in Visual Studio). To get it to run there a tool will be needed to make some practical modifications. Typically this emulation will not substitute the embedded system, and, running on a PC, it will not be able to simulator all of the microcontroller's, compiler, or peripheral quirks. But it does allow building and testing unitary pieces.

This break the overall approach down into the following parts:

Transform code
Patch tables to analysis shims
Provide support code to emulate the microcontroller semantics
Provide support code to emulate devices

What the transformed the code would look like

For a few reasons I decided to tweak the transformed code a bit. The basic idea is that my tool would rename procedures:

from to

extern int16_t foo(...); int16_t foo(...) { } extern int16_t (*foo)(...); int16_t Z_foo(...) { }

from	to
`extern int16_t foo(...); int16_t foo(...) { }`	`extern int16_t (*foo)(...); int16_t Z_foo(...) { }`

Notice that a prefix was added to the procedure name. (The Z_ was easy to use, short, and I did not have a better naming idea). Whenever a procedure calls foo it is actually calling something else. That something else is a shim procedure that does something useful.

This is done, in an extra module, that links the foo variable to the shim procedure:

int16_t (*foo)(...) = Y_foo;
...

int16_t Y_foo(...)
{
   int16_t Ret;
   TraceCall("foo", NULL);
   Ret = Z_foo(...);
   TraceReturn("foo");
   return Ret;
}

The shim code traces the call, calls the "real" routine, and traces the end of the routine. The call tracing could be a simple printf log, or something more complex. I was interested in printing out a sequence diagram, perhaps like in UML, to see what led up to a bug.

Actually, the shim code does not look like this. The shim code actually has checks for errors, which I will describe in the future. There are a few more reasons for doing translation with this kind of shims. I will explore those in the future...

Transforming the code

Let's look a bit more of the process of transforming the code. I kept restricted the simulation to an easy type of translation. If it was not easy, I might as well do something else. For a lot of reasons, C is a family of dialects, each having its own variation of grammar and semantic differences. Fortunately this is most often limited to keywords, with only slight semantic differences. Most often.

I relied on a few simplifying assumptions:

I assumed that I would be working with C code that compiles in a normal usage. Basically, code that compiles with the compiler for the embedded processor.
And I assume that there is a reasonable way to translate the code.

In my case, these are not high bars. First, I wrote the tool to work on source code I knew ' it has many regularities. (Code written to good coding standards should have a lot of regularity too.) Second, I often use lint to help catch syntactic and semantic errors with C. (I recommend PC-Lint. Strongly.)

The simplest translation approach is to use regex string substitution. Look for some sort of pattern, and then change just that, leaving the rest in tack. My translation tool primarily uses this approach.

A harder approach is parsing the code. This is not necessary to create an abstract syntax tree of the code. It is only needed to find out information, or act as a string matching and substitution, albeit more complex than regex. I needed to use parsing to find the names and types of variables, procedures (and their parameter lists). These are used to generate the shims.

Then the resulting code has to be generated. This includes renaming of the procedure names, and as well as those variable or register names. In many cases, it had to remove non-portable compiler and platform specific constructs. In a few cases, it had to introduce constructs to preserve the semantics of the target compiler or platform.

A look at some of the specific constructs that had to be papered over

Let's look at some of the specific constructs that had to tweaked, changed, or papered over. In the case of the compiler, I had to remove language structures that are useful on the target system:

With some compilers you can use the form 0b00..00 to write out base-2 numbers. I have this translated into hex format that other compilers support.
Microsoft compilers don't support designated initializers.
Some compilers let you specify their width as 1 bit, this has to get bumped to a byte.
Some compilers let you specify attributes about a procedure or variable, such as its address, storage class, or alignment. This has to be stripped off.
Some embedded compilers have anonymous unions, and this has to be cleaned up.
pragma's

What about the platform specifics?

Compilers may have integral types that cannot be properly simulated, like 24-bit integers. These integers will have overflow behaviour different that what is on the target platform. This can be emulated by either replacing the type directly or defining it in the emulation framework. (For instance if I'm using int24_t, which Visual C does not have, I typedef int24_t as in32_t).
Emulates the registers as an interface (many of these on a microcontroller). On microcontrollers, it is common for a register to be used to configure or communicate with a peripheral. I change left hand side expressions of Reg1 = into Reg1_set(...), and right hand side expressions of = Reg1 + ... into Reg1_get()
Clean up the C run time, platform and other header files.
With some platform behaviour, it is easier to replace one of the higher-level API procedures instead of emulating the registers. More on this in a minute.
Waiting for interrupts. I usually skipped this, but in some cases, I emulated this thru the register to procedure mapping above. (I know of other people who use threads, but this is pretty dodgy)

I use configuration files to tell the translation tool how to change certain types, what procedures to use for certain registers, and what procedure it should use instead of procedure XYZ. (The latter are for procedures that cannot be translated automatically, or at least well enough to make it worth it).

The caveats

As serviceable as this approach is, it is a good idea to take not of limitations. The emulation is far from complete:

Performance figures. Microcontrollers run slower, and seldom pipeline code like a desktop pc.
The program and the variables will not have same memory layout.
There is not concept of program memory that can reprogrammed or patched.
The stack memory limits are very different.

To be honest, the translation process is rickety. In some cases, it was motivation to improve the translator. In other cases, I considered it a good sign that the original source code should be written differently, perhaps mandating a style guide change as well.

Next time

Next time I will explain the portions of the emulation framework. And much later I'll explain how this can be used for testing.