Up Next

1  Introduction

Pattern-matching is a key feature of functional languages. It allows to discriminate between the values of a deeply structured type, binding subparts of the value to variables at the same time. ML users now routinely rely on their compiler for such a task; they write complicated, nested, patterns. And indeed, transforming high-level pattern-matching into elementary tests is a compiler job. Moreover, because it considers the matching as a whole and that it knows some intimate details of runtime issues such as the representation of values, compiler code is often better than human code, both as regards compactness and efficiency.

There are two approaches to pattern-matching compilation, the underlying model being either decision trees [5] or backtracking automata [1]. Using decision trees, one produces a priori faster code (because each position in a term is tested at most once), while using backtracking automata, one produces a priori less code (because patterns never get copied, hence never get compiled more than once). The price paid in each case is losing the advantage given by the other technique.

This paper mostly focuses on producing faster code in thebacktracking framework. Examining the code generated by the Objective-Caml compiler [11], which basically used the Augustsson's original algorithm, on frequent pieces of code, such as a list-merge function, or on large examples [14], we found that the backtracking scheme could still be improved.

Our optimizations improve the produced backtracking automaton by grouping elementary tests more often, removing useless tests and avoiding the blind backtracking behavior of previous schemes. To do so, the compiler uses new information and outputs a new construct. New information include incompatibility between patterns, exhaustiveness information and contextual information at the time of backtracking. As to the new construct, previous schemes used a lone “exit” construct whose effect is to jump to the nearest enclosing “trap-handler” ; we enrich both exits and traps-handlers with labels, resulting in finer control of execution flow.

Our optimizations also apply to or-patterns, a convenient feature to group clauses with identical actions. Unsharing of actions is avoided by using our labelled exit construct. As or-patterns may contain variables, the exit construct is also extended to take arguments.

All our optimizations are now implemented in the latest version of the Objective-Caml compiler, whose language of accepted patterns has been extended by allowing variables in or-patterns.

The structure of this article is the following: we first introduce some theoretical basics on pattern-matching in section 2 and describe the compilation scheme to backtracking automata in section 3. Then, we briefly introduce our optimizations and or-pattern compilation in an intuitive way in sections 4 and 5, while section 6 is a formalization of our complete compilation scheme. Finally, some experimental results are shown in section 7, and a comparison with other approaches is discussed in section 8.


Up Next