Zero Tolerance for Code Errors

"Do I Really Need to Fix Small Memory Leaks?"

Recently I was asked whether a few memory leaks were a problem. It depends, I said. Was the test case representative of how your program will run? If the program will never run for a long time, then maybe the errors are not serious.

When you run a leak analysis tool like valgrind or PurifyPlus from IBM/Rational, often the test case you choose is relatively small because these tools slow program execution considerably. The concern is that the memory leaks will continue as the program continues to run, and memory consumption will grow until serious problems (out of memory, page faulting) occur.

When this question came through, I was trying to track a memory leak in the Firefox browser (or, more likely, a plugin such as Flash). Over the course of a workday, browser memory consumption would increase by about 200 MB. That's a pretty serious problem. That's memory that my code should be using, not wasted by some careless programmer. I had to restart Firefox once or twice per day to reduce memory consumption and improve responsiveness in the browser.

The other issue for this program was a high error count. Lacking the actual tool log, I don't know what errors were being logged, but a count of 10,000,000 at the bottom of the tool run is cause for concern. Even if the program seems to run correctly, what is causing those errors?

Typical Errors Logged by Program Checkers

In addition to memory leaks, program checkers like valgrind or PurifyPlus will log errors when your program:

reads uninitialized memory;
reads or writes memory that has been freed;
reads or writes memory outside of an allocated block; or
has race conditions in multithreaded code.

These bugs in turn will cause:

data that may be out of the bounds that the code can handle;
possibly random outputs, if your code cannot recover from the bad data it gets; or
irreproducible results, making it hard to debug the code when problems arise.

It's bad enough when optimizing compilers change the results, for example when floating-point code is compiled debuggable, altering the rounding of results. Why add to your debugging difficulties?

When your program reads uninitialized memory, you don't know what values it will get. Those values may still be legal, meaning your code won't crash (this time), but it will almost certainly get different values the next time it runs. If you do have a crash as a result of reading uninitialized memory, you might not be able to reproduce the crash within the debugger (or even the next time you run the debugger). There are no guarantees in most operating environments that you will get the same values from uninitialized memory each time (and environments that guarantee the same data in memory are in fact initializing it anyway).

Reading from or writing to memory that has already been freed can cause problems if your program allocates more memory later. Many memory managers work on a "Last In, First Out" strategy, so some other part of your program may be using that piece of memory. Irreproducible results and data corruption are typical here.

Similar problems occur when your program reads or writes memory that is out of bounds. Typically this results from "off by one" array indexing errors. Many memory allocators round up the block size, so reading past the end of an array won't harm anything else, but not all allocators do this. Your array might exactly fill a memory block anyway, in which case you are reading from or writing to another memory block.

Finally, race conditions result when two or more threads within a single program try to access a variable at the same time, and at least one of those threads is writing to a variable. Depending on how the shared variable is being used, you could get anything from irreproducible results to full scale data corruption. Bugs like this are notoriously hard to track down.

A common message is "variable may be used before it is set." Sometimes this is just a warning, caused by code similar to the following:

if (x) { // assign variable } ... if (x) { // use variable }

The condition in the "if" statements is the same, so either both blocks will be skipped or both blocks will be executed. If the variable is not assigned, it will not be used. GCC is not yet able to detect this situation. As a programmer, I can see that the code is safe, but I assign an initial value to the variable anyway. Not only does this reduce the warning count, it also reduces the risk that the two conditions will be changed in some way so that the variable actually is used before it is set.

Reasons to Fix Everything

The main reason you need to fix everything is that you need to be able to trust the results you get. For science, engineering, and business uses, this means accuracy. You don't want to get the wrong answer, or a different answer each time. You need consistent, explainable results. For Internet applications in particular, the memory access errors described above lead to security holes. Attackers can exploit the uncontrolled memory accesses by inserting malevolent code or data.

The second reason you need to fix everything is that it shows that you have at least tried to understand how the code really works. Code is supposed to have a precise behavior. When you say "I don't know" about any part of your software, you say that you haven't specified it precisely. This reflects badly on your work and can hurt your career (or your business, if you run your own).

A common excuse for inaction is "it works now." You cannot honestly say that when you don't really know how the code works. All you can say is that the code seems to work for your current test cases. Even if you use a robust testing methodology that exercises every line, memory access errors mean that your tests are incomplete (or that you've been extremely lucky not to get varying results).

Fixing all of the errors helps you maintain your code in the long run. Very few programs are "use once and throw it away" - at the very least, you usually need to repeat your results later. If you fix all of the errors early, you won't need to remember which ones have been left behind. This is especially important when (not if) someone else works on your code. The next developer should not have to spend time looking at random results or (worse yet) the error logs that you have just ignored.

Getting rid of errors and warnings helps keep code quality up, as well. It is inevitable that new code will have bugs. If the compiler prints 1,000 warnings when building your product, it is likely you won't notice another couple of warnings. Eventually you will have 1,100, then 2,000, etc. The same logic applies to runtime errors and warnings. If you start with zero messages, any messages will be immediately apparent. If this occurs right after code has been added or modified, it will be much easier to isolate and fix the problem.

Finally, fixing memory access errors will improve repeatability for debugging. Your code will have bugs, and you will have to fix them. If your code is using random values or corrupting random data, running it under a debugger is almost guaranteed to give you different results. Debuggers change the environment in which programs run, perhaps by altering the memory layout or by initializing memory to different values. Compiling a program with debug information (or simply changing optimization flags) will also affect the memory layout. Fixing memory access errors removes one big variable from your problem analysis.

Back to Memory Leaks

So what about memory leaks? Is it ever OK to leave a few in your program? The answer is "maybe." Memory leaks are a sign that you don't have full control within your code, but some leaks are either harmless or not worth the effort:

memory allocated by libraries beyond your control (e.g. the GCC runtime library);
static objects (used throughout the life of the program) that are not cleaned up when the program exits; and
objects not deleted prior to an error exit.

Leaks from standard libraries and static objects are closely related. These are used (or at least available) throughout the lifetime of your program. Because they are static, their memory leaks will not accumulate over time - their cost is fixed (if not, they are not really static and you need to fix the leaks). Finally, it can be extremely hard to control how and when a program exits. Static destructors in C++ and atexit() in C can help, but you don't have control over the order in which destructors are called and there may be a limit on the number of atexit() handlers.

When your program exits abnormally, it may leak significant amounts of memory. It is probably not worth the effort to fix these leaks, because the program is about to release all memory back to the operating system anyway, so the lifetime of any leak is going to be short. There are also many places and many ways for your program to exit, and the conditions under which it exits may not be entirely predictable. For example, consider the problems that arise when an exception is thrown in the middle of a C++ constructor - the object may be only partly initialized, and now its destructor must be called.

Beyond these circumstances, all memory leaks should be fixed. First of all, the next time your program runs, it may leak much more memory - especially if the analyzed run is not typical (i.e. smaller so it runs faster). Second, if your code is good it will be reused, possibly in a longer-lasting environment which may provide more opportunities for leakage. Finally, your program may be enhanced someday in ways that invoke the leaking code more often.

Conclusions

Runtime analysis tools are invaluable for isolating difficult-to-find bugs. You might have used them solely to help solve a crash, but it is good practice to run them on every program you put into production. It will not only reduce your workload in the long run, but it will make a statement about the quality you put into your code - your professionalism.

Tool Links

valgrind, an award-winning (and free) software analysis tool for Linux and MacOS, is available at

http://valgrind.org/

PurifyPlus is sold by the Rational division of IBM. It's been quite a few years since I have used it, but I had positive experiences with it. It runs on multiple platforms including AIX and Microsoft Windows. Information is available at

http://www-01.ibm.com/software/awdtools/purifyplus/

Chapman Consulting

Software Development Done Right.