Quickly Getting Up to Speed
When you are asked to speed up a program, the temptation is to launch it using a runtime profiler such as gprof (the Free Software Foundation tool) or IBM's Rational Purify. Typically profilers instrument a program, then run it under the profiler's control. They may measure the length of time taken for each instruction sequence, or they may interrupt the program many times per second and count the number of times the program is in each portion of the code. They may even do both.
Runtime profilers can be very useful, but your program will run much slower under test than it normally would. For software that already takes minutes or hours, this can be quite a burden. There is additional overhead for function calls and returns, so well-structured code with many small functions may slow down still more.
Also, a program with multiple phases may have several independent sets of hotspots. A summary report for the entire program may not reflect the hotspots of each phase.
Finally, a post-mortem analysis of a program's hotspots will probably not tell you why the hotspots are being executed so often. This can be crucial in determining the best optimization strategy.
Optimization by Program Sampling
For these reasons, my initial optimization strategy is simply to run a program in the debugger and interrupt it every so often. In the GNU debugger gdb, this is done with a user interrupt (^C). From here I can determine where the program is running at that moment and also examine the surrounding code and variables.
The key here is to look for patterns. Pattern-matching is a key skill in program optimization; usually you can write code to exploit the patterns you find. You may find redundant operations, or values that can be precomputed and stored, or calculations that are slower than they could be. In all of these cases, the code is likely simpler than more-optimal code would be. This of course follows the rule "make it right before you make it fast."
Here are some of the things I look at when in the debugger:
- Is the program in the same family of low-level routines each time?
- Is the call chain (above the first or second routine) the same each time?
- Are the parameters to any of the routines in the call chain the same each time?
- Are the values in the data structures the same each time?
- Are any assumptions of the code (usually as described in comments) wrong?
You don't necessarily have to write down all of this information each time. If the program runs for more than a few seconds, it will come back to the same places again and again. Eventually you will notice this and can start looking for patterns within the data.
These analyses take time, of course, and you will have to trade off sampling frequency vs. usefulness of the information gathered. If you sample too often, you can spend more time than a profiler would. You may find that the call chain is always the same and the data is always different. This isn't particularly useful. On the other hand, if you don't sample often enough, the call chain will always be different and you won't get a true feel for where the time is being spent.
If your program has different phases (e.g. reading, then processing, then sorting, and finally writing), set break points at the start and end of each major phase so you know when the program moves from one phase to the next. Otherwise you may misinterpret the patterns you see at the lower levels of the code - they will change without your knowledge.
With some understanding of the program flow, you can also adjust the sampling frequency. Tight loops processing a lot of data can be understood quickly; you can then move on to the next part of the analysis. Complex transformations may require a number of samples to understand.
A Small Example
As an example, imagine a textual analysis program that usually parses small files. As an option, it can sort the data prior to analysis. On a modern processor the program is expected to run very quickly, even for large files, but it takes minutes to run on a file that is a few megabytes long. An interrupt shows that the program is sorting in the following fragment:
void foo(std::vector<std::string> &ary)
{
/* do something */
std::sort(ary.begin(),ary.end());
/* do something else /*/
}
This should not be a big deal, so you let the program continue for a few more seconds. The next interrupt is in the same location - the sort has not yet completed. You set a break point at the return from the function and let the program continue for a few more seconds. The break point is not reached, and the next interrupt is still in the sort routine. You have now found the program's hotspot in a couple of minutes elapsed time. No profiling is needed.
Sorting in memory should be very quick, but sorting an array of strings in C++ requires that two strings be allocated, copied, and released each time a comparison returns "out of order." In C, the pointers would simply be swapped - a much faster operation. This was a real example, and to solve the runtime problem I converted the std::string array into an array of character pointers (using the C solution in a C++ program).
Conclusions
The profiling tools will eventually give you the same information about where your program is spending its time, and if there are numerous independent hotspots in a single phase of the program, the tools may even give you more information than program sampling could. But they cannot tell you why you are in a particular piece of code. Often that is more important than knowing how often you are in a particular piece of code, and you will still need to set breakpoints in the hotspots to find out why. You might as well find out the first time rather than wait for a profiling tool to finish.
Finally, program sampling with user interrupts is free. There is no cost and no runtime penalty. You may find that you can solve the problem the first time, as I did in the example above. Sometimes the best things in life really are free.