Previous Up Next

2  Basics

In this section, we introduce some notations and definitions. Most of the material here is folklore, save, perhaps, or-patterns.

2.1  Patterns and Values

ML is a typed language, where new types of values can be introduced using type definitions such as:
  type t = Nil | One of int | Cons of int * t
This definition introduces a type t, with three constructors that build values of type t. These three constructors define the complete signature of type t. Every constructor has an arity, i.e. the number of arguments it takes. Here arity of Nil is zero, while the arities of One and Cons are one and two respectively. A constructor of arity zero is called a constant constructor, while other constructors are non-constant constructors.

Most native data types in ML – such as integers, records, arrays, tuples – can be seen as particular instances of such type definitions. For example, in the following we will consider lists (nil being the constant constructor [] and cons the infix constructor ::), and tuples (the type of n-tuples defines one constructor of arity n, pairs being written with the infix constructor “,”). For our purpose, integers are constant constructors, and the signature of the integer type is infinite.

More formally, patterns and values are defined as follows:
p ::=     Patterns
  _   wildcard
  x   variable
  c(p1,p2, … ,pa)   constructor pattern
  (p1p2)   or-pattern
v ::=     Values
  c(v1,v2, … ,va)   constructor value
In the following, we freely replace variables by wild-cards “_” when their names are irrelevant. While describing compilation, convenient tools are vectors of values (v = (v1  v2vn) and vnm = (vn ... vm)), vectors of patterns (p = (p1  p2pn) and pnm = (pn ... pm)) and matrices of patterns (P = (pji)).

In this paper, we present pattern-matching compilation as a transformation on an intermediate code in the compiler, called lambda-code. Here, another useful object is the clause matrix (PL):
(PL)=




  p11   p21  ⋯  pn1  →  l1
  p12   p22  ⋯  pn2  →  l2
  p1m   p2m  ⋯  pnm  →  lm




A clause matrix associates rows of patterns (p1i  p2ipni) to lambda-code actions li.

2.2  Pattern Matching in ML

A pattern can be seen as representing a set of values sharing a common prefix.

Definition 1   Let p be a pattern and v be a value belonging to a common type. The value v is an instance of the pattern p or p matches v, written p v when one of the following rules apply:
_ v   
x v   
(p1p2) v    iff p1 v or p2 v
c(p1, … , pa) c(v1, … , va)    iff (p1pa) (v1va)
(p1pa) (v1va)    iff pi vi, ∀ i ∈ [ 1 .. a ]
Seeing a pattern as the set of its instances, it is clear that or-patterns express set union.

In ML, patterns are a binding construct, more specifically, a successful match p v, binds the variables of p to some sub-terms of v. Such bindings can be computed while checking that p matches v, provided that the following set V(p) of variables defined by p is well-defined:
V(_) =
V(x) = {x}
V(c(p1, … , pa)) = V(p1pa)
V(p1pa) = V(p1) ∪ … ∪ V(pa)
       if for all ij, V(pi) ∩ V(pj) = ∅
V(p1p2) = V(p1),  if V(p1) = V(p2)
The first “if” condition above is the linearity of patterns. The second condition is specific to or-patterns, it means that matching by either side of the or-pattern binds the same variables (additionally, homonymous variables should possess the same type).

We then define the now dominant, textual priority scheme to disambiguate the case when several rows in a matrix match:
Definition 2   Let P be a pattern matrix and v = (v1vn) be a value vector. The value v matches line number i in P, if and only if the following two conditions are satisfied:
We will not give a full semantics for evaluating pattern-matching expressions, and more generally lambda-code. Intuitively, given a clause matrix PL and a value vector v such that line number i in P matches v, evaluating the matching of v by PL in some environment ρ is evaluating li in ρ extended by the bindings introduced while matching v by (p1ipni). If v is not matched by any line in P, we say that the pattern-matching P fails. If no such v exists, then the pattern-matching is said exhaustive.

Like pattern vectors, pattern matrices represent sets of value vectors. More specifically, when some line in P matches v we simply say that P matches v. This looks obvious, but representing sets using matrices is at the core of our optimizations. One easily checks that the instances of P are the union of the instances of the lines of P. That is, when considering a matrix globally, the second condition in definition 2 above is irrelevant. More important, row order is also irrelevant.

Finally, the instance relation induces relations on the patterns themselves.
Definition 3   We define the following three relations:
  1. Pattern p is less precise then pattern q, written p q, when all instances of q are instances of p.
  2. Pattern p and q are equivalent, written pq, when their instances are the same.
  3. Patterns p and q are compatible when p and q share a common instance.
Here some remarks are to be made. Because of typing, checking the precision relation is neither obvious nor cheap. More precisely, there is no simple way to decide whether p _ holds or not. For instance, ([]|_::_) _ holds, while (Nil|One _) _ does not. Or-patterns are not responsible for this complication, since we also have (_,_) _. In such cases one should “expand” p and consider, whether signatures are complete or not (see [13, Section 5.1]). By contrast, compatibility can be checked by a simple recursive algorithm. When compatible, patterns p and q admit a least upper bound, written pq, which can be computed while checking compatibility:




(p1pa)↑(q1qa) = (p1q1paqa)
_↑q = q
p↑_ = p
c(p1, … , pa)↑c(q1, … , qa) = c(r1, … , ra)
where (r1ra) is (p1pa)↑(q1qa)
With the following additional rules for or-patterns:
(p1p2)↑q =

p1q,  when p2 and q not compatible
p2q,  when p1 and q not compatible
(p1qp2q),   otherwise
p↑(q1q2) = (q1q2)↑p
Proving that pq is indeed the least upper bound of p and q is easy, by considering patterns as sets of their instances. Note that pq is defined up to ≡-equivalence, and that it encodes instance intersection.


Previous Up Next