1 Introduction
Pattern-matching is a key feature of functional languages. It allows
to discriminate between the values of a deeply structured type,
binding subparts of the value to variables at the same time. ML users
now routinely rely on their compiler for such a task; they write
complicated, nested, patterns. And indeed, transforming high-level
pattern-matching into elementary tests is a compiler job. Moreover,
because it considers the matching as a whole and that it knows some
intimate details of runtime issues such as the representation of
values, compiler code is often better than human code, both as regards
compactness and efficiency.
There are two approaches to pattern-matching compilation, the
underlying model being either decision trees [5] or
backtracking automata [1]. Using
decision trees, one produces a priori faster code (because each
position in a term is tested at most once), while using backtracking
automata, one produces a priori less code (because patterns
never get copied, hence never get compiled more than once). The price
paid in each case is losing the advantage given by the other
technique.
This paper mostly focuses on producing faster code in thebacktracking framework. Examining the code generated by the
Objective-Caml compiler [11], which basically used the
Augustsson's original algorithm, on frequent pieces of code, such as a list-merge function, or on large
examples [14], we found that the backtracking
scheme could still be improved.
Our optimizations improve the produced backtracking automaton by
grouping elementary tests more often, removing useless tests and
avoiding the blind backtracking behavior of previous schemes. To do
so, the compiler uses new information and outputs a new construct.
New information include incompatibility between patterns, exhaustiveness
information and contextual information at the time of backtracking.
As to the new construct, previous schemes used a lone “exit”
construct whose effect is to jump to the nearest enclosing
“trap-handler” ; we enrich both exits and traps-handlers with labels,
resulting in finer control of execution flow.
Our optimizations also apply to or-patterns, a
convenient feature to group clauses with identical actions.
Unsharing of actions is avoided by using our labelled exit construct.
As or-patterns may contain variables,
the exit construct is also extended to take arguments.
All our optimizations are now implemented in the latest version of the
Objective-Caml compiler, whose language of accepted patterns has been
extended by allowing variables in or-patterns.
The structure of this article is the following: we first introduce some
theoretical basics on pattern-matching in section 2 and describe
the compilation scheme to backtracking automata in section 3.
Then, we briefly introduce our optimizations and or-pattern compilation in an
intuitive way in sections 4 and 5, while section 6
is a formalization of our complete compilation scheme. Finally, some
experimental results are shown in section 7, and a comparison
with other approaches is discussed in section 8.