Performance

Randall Maas 6/29/2010 11:32:05 AM

In this entry, I would like to describe a profiling algorithm I considered, but never completed. (The performance never matter enough to examine the profile results in depth.) I believe that this approach can offer a different kind of insight.

It is intended to answer the following questions:

What is the typical and worst case time for a procedure, task, or lock? Typically, the more frequently a task is run, the faster it must run in order for the system to be correct.
What are the core procedures that affect slow performance?

Tracing Subsystem

The performance profiling starts with execution time approximation. This profiling information is tracked thru the trace subsystem. As I mentioned earlier, the trace subsystem is intended to track all sorts of things, like locking and unlocking, I²C bus events, timer events, and so on. It is used to produce (UML like) sequence diagrams.

The trace system is also intended for use as part of performance profiling. When the trace system is called (at the entry and exit of a procedure), it will also send information to the profiling subsystem.

From there the information (and its analysis) is divided into two levels of profiling (simple and hard):

Static level: how many times the procedure was called and average duration of the call.
Conditional: how A calls B (or A locks B) and the typical amount of time for that call.

To support these (especially the later), the trace module sends the following information to the profiling analysis subsystem.

It counts the number of times a procedure is called,
It counts the number of times procedure A calls procedure B,
It estimates the total amount of time spent in the procedure B, including all of the child calls, when procedure A calls it,
It estimates the total amount of time spent in the procedure, excluding the child calls, when procedure A calls it,
It estimates the total amount of time spent in the procedure, and all of the child calls,
It estimates the total amount of time spent in the procedure, excluding the child calls,
It tracks the number of waiters for a lock

Note: it could also track other statistics about the call duration, including the variance, min and max time, etc.

Call transition Probability

How could this information get used to identify the most important routines? Let's look at how these details can help simplify and focus attention on the most important procedures, locks, devices, etc. This tracked information is made into a big matrix of information. The matrix is sparse (a procedure calls maybe eight others), and internally represented as a hash table.

The matrix is used to compute the transition probability from one procedure to the next and how long the call takes. For numerical reasons the zero call instances are change to an almost zero probability, called epsilon. The transition probability is used to compute the standing probability of each procedure; that is, the probability the procedure will be involved in the call (either working or calling another). This forms an eigenvector - a probability of each procedure being involved in call.

Then list of procedures is sorted from the highest rank to the lowest, and work with them. Or we could sort based on the probability the procedure will be called, and its duration (normal and worst case)

Then this information is reported, usually as a table, with extra information and the key elements flagged.

The intent of all this matrix/eigenvector stuff is to find a procedure doesn't directly do a lot of slow work, but is the one that dispatches lots of work. That procedure is still important, even though little time is spent directly in it.