There are multiple ways to optimize software that is too slow or too big (uses too much memory): make individual pieces of code more efficient, speed up computation of common cases, or store results for reuse. Here I describe ways to optimize algorithms by domain-specific pattern matching for common cases. See Software Optimization by Code Rewriting for ways to speed up small sections of code and Software Optimization by Data Caching for ways to store results for reuse.

A first implementation of an algorithm is often simplified in some ways to reduce development time. For most programs, where the performance "hotspots" are not known ahead of time, this is the best strategy. "Make it right before you make it fast (or small)." Highly optimized code tends to be more complex, and increased complexity leads to a higher bug rate for two reasons: there are more lines of code, and the number of errors per line of code tends to be higher for code that is not straightforward.

Once you have working software, profiling will tell you where the program spends most of its time (or uses the most memory). This original implementation provides a reference against which you can compare the improved software for correctness and efficiency. If there are differences, you can single-step through the implementations in parallel to identify the point at which the differences begin. If a trial solution is slower or bigger, you can use this information to refine your understanding of the software.

**Common Cases in Plane Geometry**

In many problem domains, some parameter sets require fewer computations. I do a lot of work in computational plane geometry for integrated circuits, and a lot of time in these applications is spent working with straight lines and their intersections. Recall that the equation for a line in a plane is

Ax + By = C

Given an x (y) coordinate on the line, the corresponding y (x) coordinate can be computed as:

y = (C - Ax) / B

x = (C - By) / A

The intersection between two lines A_{1}x_{1} + B_{1}y_{1} = C_{1} and A_{2}x_{2} + B_{2}y_{2} = C_{2} can be computed by substituting using the above calculations and equating the results:

y_{1} = (C_{1} - A_{1}x_{1}) / B_{1}

y_{2} = (C_{2} - A_{2}x_{2}) / B_{2}

(C_{1} - A_{1}x_{1}) / B_{1} = (C_{2} - A_{2}x_{2}) / B_{2}

If x_{1} == x_{2}, we get

(C_{1} - A_{1}x) / B_{1} = (C_{2} - A_{2}x) / B_{2}

x = (C_{1}B_{2} - C_{2}B_{1}) / (A_{1}B_{2} - A_{2}B_{1})

Even if A_{1}, B_{1}, C_{1}, A_{2}, B_{2}, and C_{2} are all integers, this involves a floating point division, which is expensive. Not only that, but the slope of a vertical line is infinite, so some special case handling is necessary anyway.

In computer graphics and integrated circuit design, lines often have a slope of +1 or -1 (45 degrees up or down). Here the y coordinate changes by one unit up or down for every unit that the x coordinate changes. We can take advantage of this to convert the floating point division to an integer addition or subtraction.

Similar logic applies when one of the two lines is vertical or horizontal: the x or y coordinate of the intersection is fixed, so only the second coordinate must be computed using the slope of the non-orthogonal line (which is often stored anyway, since it is used so often).

Finally, no calculations are needed if both lines are orthogonal; the appropriate x or y coordinate can simply be used directly.

Thus considerable computational cost can be avoided by considering all of the special cases for lines in the problem domain. More lines of code are required, and the instructions necessary to select the additional special cases have some cost, but on low-power hardware these optimizations are always worthwhile, and they usually provide some savings even on workstation-class machines.

**Finding Common Patterns in Your Code**

The optimizations for computational geometry are well-known, and someone writing plane geometry code now would probably implement the speedups from the start. If your problem domain is new or less well-defined, you will have to develop your own patterns.

For me, pattern matching has been of most benefit when I am trying to reduce runtime, so most of this discussion will focus there. Similar logic will apply when you have a computation that requires a large amount of memory: look at the parameters of the computation to see whether some memory use can be avoided for that problem instance.

Once you have found some of the patterns that appear most often, look at the code which is to be executed. Make a copy of it, select a common pattern, and start simplifying the copy based on what you know about the parameters in this pattern. Some calculations will simply disappear; others will require much less code because the relationships between parameters will now be known. Conditional logic will become much simpler as well. When you know that a condition is true or false, you no longer need to test for it. The code becomes more of a straight execution line, allowing you to simplify and combine even more.

Invoke the new function by adding code which tests for the common pattern in the original function. Do this for several common patterns, until you run out of good patterns or the original function begins to get cluttered.

Test each of the new code blocks to ensure they still perform the same function as the original, generalized code. Often you can do this by running the new code at the same time as the original - run the optimized version, then the general version, and compare results on the fly. These test runs will obviously be slower than the original, but you need to continue following the rule "make it right before making it faster."

Once you have some of the most common cases coded, tested, and integrated, stop invoking the generalized calculations and see how much improvement you get. You might find that you get no improvement whatsoever! This can happen when the specialized code is not much shorter than the generalized code, or when the code needed to select the specialized case has a high cost relative to the general calculation.

Sometimes the overhead of calling special-case routines can be high enough to notice, in which case you can bring the special-case code back into the original routine. This requires caution, because larger routines are harder to understand, debug, and maintain.

Even if the new specialized code is no faster than the original, there is still hope. Profiling of the new code may find hotspots within it that can be improved by code rewriting. You may also find that not all specializations are faster; since the generalized code is still available, you can revert to using it for the slower specializations. Each specialization is independent, so you can simply revert them one by one to see which ones improve runtime.

You may also find that some domain knowledge is helpful, especially if you are not the original author. Subtle relationships between the parameters can allow more simplification. Looking at the code by itself may not tell you about these relationships. Try to find another developer or a user who can give you guidance about the specializations you have identified.

Last but not least, once you have removed several special cases from the generalized code, you may find that it can be simplified as well. Do this cautiously, particularly if you have not yet seen any overall runtime improvement. Save a copy of the original code (you are using a version control system, right?) so you can go back to it. If you have been following my suggestions, up until now you have always had the original implementation available, but after you simplify that, it is hard to bring code back in.

You may find that removing some of the code from the general-purpose routine gives you a set of sleek routines that each do one thing well, rather than a single sluggish "kitchen sink" routine that tries to do everything. This can be an improvement in code quality in addition to runtime.

**Downsides of Optimization by Pattern Matching**

There are downsides to any optimization strategy, and pattern matching techniques increase the amount of code to write, debug and maintain. In particular, copying the master routine and specializing it violates the rule that you should do something in only one place. Make sure you document the original source of the specializations and note in the general-purpose routine that code has been copied. Otherwise, you may find that you have copied a bug to multiple places, but fixed it only once.

Inline functions may help here. If you can put some or all of the copied code into inline functions, you can fix any bugs once and have them propagate to all of the specializations automatically without any function call overhead.

Complex code is harder to test as well. It is crucial that you specialize code only after it is working properly, to ensure that the new faster code gets the right results. Use white box directed tests to exercise each specialized routine, or at the very least compare the final outputs vs. an application-level golden output.

**Conclusions**

Not all programs can be improved by pattern matching and specialization. You may find that the code required to detect common patterns is slower (or, for memory consumption, uses no less memory) than the code it means to replace, or that the specialized versions of routines are no simpler than the original. In that case, optimization by code rewriting or data caching may be more appropriate.

Many problem domains, however, have highly variable inputs with unique processing requirements, and these often benefit greatly from pattern matching. It's a valuable skill for your programming toolbox.