8 Related Work

8 Related Work

8.1 Decision Trees vs Backtracking

Compiling to decision trees is the original approach to pattern matching compilation; it first appeared in the Hope compiler and is described in [5]. It is currently used in the SML-NJ compiler [7].

In this approach, there is no mixture rule: instead, the constructor rule applies as soon as there is at least one constructor in the first column, and a specialization matrix is created for each matched constructor, plus one additional matrix for the remaining constructors in the signature of the types of matched values, if any. Specialization is done by following the rules of section 6.1. This means that rows whose first pattern is a variable get copied several times.

On the one hand, this approach guarantees that one constructor test is never performed twice. On the other hand, copied pattern rows are compiled independently and this result in potentially large code size. Namely, examples exist that make the SML-NJ compiler produce exponential code [12].

Compilation to backtracking automata is the classical scheme of section 3.3 (see also [1, 20]). It is currently in use in the Haskell-HBC and Objective-Caml compiler [11]. As we already argued, its main advantage is that patterns are never copied, yielding linear output size. Of course, the price paid is potentially testing the same sub-term several times, resulting in potentially poor runtime performance. In that aspect, our new compilation scheme shows that this price can be reduced significantly.

Compilation to decision trees easily detects unused match cases and non-exhaustive matchings, since there is no dead code in a decision tree. Detecting these situations is important, as programmers should be warned about them. However, those problems are NP-complete [17] and this gives us a hint about the potential size of decision trees. More concretely, a decision tree may have many leafs corresponding to non-matched values, whereas knowing that one such values exist is the needed information. Rather, we check unused match cases and exhaustiveness before compilation with a simple algorithm [13] that solves the used matched case problem by basically traversing the decision tree without generating it. Advantages are not generating the tree, stopping search as soon as used match cases are found and applying various heuristics and matrix simplifications which are not relevant to direct compilation. Then, one of our optimizations uses exhaustiveness information.

8.2 Compiling or-patterns

From available ML or Haskell compilers, we only found two compilers dealing with or-patterns: the (old) Objective-Caml compiler and the SML-NJ compiler. Our technique makes the old Objective-Caml scheme (see section 3.3) obsolete, by both producing more efficient code and allowing variables in or-patterns.

The SML-NJ approach is very simple to understand and implement: or-patterns are expanded during a pre-processing phase. However, as we already discussed at the end of section 5, this may lead to many duplications of patterns. Such a risk is compatible with the very philosophy of compilation to decision trees and is natural in that context.

8.3 Optimizations

Most optimizations dealing with pattern-matching in the literature try to improve the order in which tests are performed. In the matrix-based description, one considers alternatives to systematically choosing the first column of matrices in the constructor rule. Hence, such an approach can be characterized as “column optimization”, while our approach would rather be “row optimization”. Since choosing the best column is thought to be NP-complete (to our knowledge, there is no published proof), most approaches describe heuristics. A typical and early work on such heuristics is [2], a more recent and thorough study is [16]. Another, related in practice, approach relies on sequentiality theory to identify directions that are columns that must be tested by all possible matchers [10, 15, 17, 13]. However, computing directions is expansive, and one can consider relying on cheaper heuristics.

These works rather apply to the decision trees, with a primary focus on reducing code size. It is unclear to us how to combine column and row optimization in practice and whether this would yield noticeable improvements or not.

There also exists a partial-evaluation based approach to pattern-matching optimization. [8] and later [18] specialize an ultra-naive pattern-matching interpreter to create an efficient pattern-matching compiler. Both authors use context information as we do. By contrast, their target is decision trees. In the end, the automatic process of partial evaluation does not find as many optimizations as we do.