Never Throw Test Software Away!

Testing by Levels

Most of my career has been in the Electronic Design Automation (EDA) industry: developing, testing, and optimizing compute-intensive software for Integrated Circuit (IC) design. EDA software is characterized by complex data structures, complex heuristics in optimization, and large data sets that often do not fit into main memory in their entirety.

Software testing is never easy, and in my experience proper testing of a program requires about as much time as writing the original code. Good programmers partition their code to reduce unnecessary coupling (dependencies) between modules, with the result that programs are split into architectural levels. Each new (higher) level depends upon (calls) one or more of the lower levels.

One of the principles of structured programming is information hiding - the details of lower level code are hidden from upper level code. This can be implemented in several ways: applications programming interfaces (APIs) that limit access to lower level routines or data, opaque data structures (handles) that hold information for lower level routines, and private class variables (e.g. C++). The goal is to limit the number of different states in the lower level to those anticipated by its developers.

A Plane Geometry Processing Example

Many geometric processing algorithms, such as those used to fill polygons in graphics displays or find design rule violations in an IC design, use plane sweep (also known as scan line) techniques. A very complex two-dimensional problem is turned into a series of less-complex one-dimensional problems. Data is processed at a fixed X or Y location, examining objects in a dynamic data structure known as a scan line, which holds all objects crossing that coordinate value.

After processing completes at the current X or Y value, the value is advanced (e.g. by one pixel) and processing of the objects in the scanline is restarted.

Objects may be inserted into a scan line, removed from a scan line, modified within a scan line, or exchanged within a scan line. The operations to be performed are based on state variables that are continuously updated as processing continues within a scanline pass.

Pseudocode for a plane sweep often looks like this:

for (each unique lower Y value) { for (each edge in each design layer starting at that Y value) { find insertion location for edge insert and process edge } }

Hidden in the words are additional support functions beyond the needs of scan line administration as described above: reading edges from files, ordering them for the input queue, and determining what processing steps to apply for each edge in a scan line. By the time all of the support routines are written, there could be ten thousand or more lines of code.

Most of the support code doesn't do anything visible to the user - it simply ensures that the application has data to process. Often it doesn't have much conditional logic (particularly as compared to the processing step), so there is a temptation to write all of it prior to testing any of it. This can lead to disaster, especially in optimization code. Bugs might not cause a crash, but only yield incorrect results. These results will then impede algorithmic tuning - changes will have seemingly random effects.

Lessons Partly Learned

I learned this years ago, and I attempted to mitigate the problems in an advanced interconnect circuit extractor I was developing. I wrote test code at every level of operation, then verified proper function before moving on. However, the language I was using at the time (MAINSAIL) encouraged private helper functions, and I could not figure out a good way to keep the test code and make it publicly visible while preventing misuse. All public code in every module was available for use at any time.

Thus, I deleted the test-specific code as I went. As is typical, the test code for a lower level of code looked a lot like the level immediately above. This let me leverage my software testing to speed up development. When I got to the highest level of code, I tested the system by computing capacitance values manually and comparing them to the final results from the extractor. They matched, so the extractor was released for initial testing.

A bug report came back almost immediately. This was a lateral capacitance extractor, looking horizontally for adjacent wires instead of examining only overlaps. It was natural that the values computed would not match the earlier extractor - more data was considered, and some of the formulas differed. I had spot-checked a few values against the earlier extractor and they matched expectations, but that was not enough.

An experienced design engineer, knowing that he could not hand check the results in his large design, had dumped all of the capacitance results from the old extractor into a text file. He then set the lateral capacitance coefficients to zero to minimize the differences, then dumped the new capacitance results into a second file. Finally, he sorted both sets by name, merged them into a single spreadsheet, ordered the paired results by increasing capacitance value, and plotted the data on a graph.

For the same node, the two extractors should have obtained about the same results when lateral capacitance was excluded. There were representational differences between the extractors, so perfect correlation was not possible, but the results should have been close - or at least had a distinctive scale factor due to the differing methods used to compute overlap capacitance.

They were not. Instead of a solid diagonal line, the old-vs.-new comparison looked like a shotgun blast. Numbers were all over the map. Sometimes the old extractor got a higher value, and sometimes the new extractor got a higher value. Sometimes the difference was small, and sometimes it was large. Clearly something was out of control.

I went back to testing again. I wrote my own graphing program to let me identify troublesome nodes by name just by moving the cursor over the plotted points, and added diagnostic code to trace the calculations for a given node. This was extremely tedious, of course, and another software engineer had to be assigned to help. One by one we tracked down the missed capacitances, double-counted capacitances, and just plain wrong capacitances.

Eventually the shotgun blast turned into a tight line with a handful of outliers at the lower end (where the representation differences could have the most impact). At this point we could state with confidence that the edge processing code was now computing values as it should. Strictly speaking, we were not testing the lateral capacitance calculation code (because there were no results for comparison), but this was just a few lines with no conditional logic - within our abilities to verify by inspection.

Once the correlation problems were resolved, the interconnect circuit extractor could finally be released to customers. Now bugs came back only occasionally.

Lessons Fully Learned

As I worked to fix the correlation problems, I considered what I could do in a situation like this. I realized that the loss of the test harnesses developed at each level of the code was a significant problem. Now, instead of running test code at full speed, we were single-stepping through production code that was not running as many self checks. Problem areas had to be analyzed by hand and fixes rerun using the same tedious process. I swore that I would never do this again.

I left that company before I was able to explore better solutions in MAINSAIL. I probably would have tried to define another library, not to be shipped to customers, containing top-level driver modules that could be launched to exercise various components of the application. I had been working on a test driver script that could run tests at the application level (validating against "golden" output files or running developer-specified validation scripts), so I probably could have worked with the software release team to extend the script to launch test modules that would not be available to customers.

When I began my own research work in transistor-level layout synthesis, I began writing standalone test programs that would not be linked into the product but would be run as part of the standard build process. Working in C++, I wrote one program driver per C++ source file and added an "all_tests" target to the "make" control file. "make all_tests" would then compile all production code, store it in the directory's library file, build the test programs, link each one against the library file, and then run all of them.

Each program prints any errors to the standard output and returns a non-zero exit code if problems are found. This interrupts the build. Each test program performs all of its own checking so that no outside comparison of results vs. a "golden" output is necessary (though some programs have their own "golden" outputs, they run the comparisons themselves).

This has been a huge efficiency gain over the years; I can port code to new platforms or new compilers with ease. Compiler problems and hidden bugs (and there have been a few of each) are flagged immediately where they occur; they do not show up mysteriously much later in testing. Nor do I have to run the applications manually each time; the computer does all the work for me.

Conclusions

Never throw test code away - it speeds up maintenance and debugging, and lets you ensure that bug fixes do not cause new bugs. If you've already written the test code, why not continue to make use of it?

Finally, if the structure of your code does not allow testing at all levels, it is most likely wrong. Even modules with static functions can be fully exercised if there are not too many hidden levels. If you can't test it, you shouldn't release it. Otherwise your customers will end up testing it for you, much to your embarrassment.

Chapman Consulting

Software Development Done Right.

Testing by Levels

A Plane Geometry Processing Example

Lessons Partly Learned

Lessons Fully Learned

Conclusions