Previous Up Next

5  Compiling or-patterns

Until now, the code produced for or-patterns is inefficient, because only one or-pattern can be compiled at a time, requiring multiple applications of the mixture rule before and after each or-pattern. Thanks to integer labelled exits, one easily avoids dividing matrices before or-patterns. Consider a “car” function for our three-constructors list:
  let car list = match list with   | Nil -> -1   | (One x | Cons (x,_)) -> x

Compilation proceeds by allocating a new trap-handler number 2 and expanding the clause “One x | Cons (x,_)” into two clauses with patterns “One x” and “Cons (x,_)”. Actions for the new clauses are exits to 2:
  catch    C((list), (
Nil  →  -1
One x'  →  (exit 2 x')
Cons (x', _)  →  (exit 2 x')
)      with (2 x) C((),(
 →  x
))
Note that both exits and trap handlers now take yet another extra argument, the occurrences of x' in exits are non-binding and refer to pattern variables, while the occurrence of x in handler is binding. This new construct allows the compilation of or-patterns with variables. Implementation is not very tricky: the catch... with (2 x) … construct allocates one mutable variable; an exit updates this variable, which is read before entering the handler. In a native code compiler, such a variable is a temporary and ultimately a machine register. The generated lambda-code is as follow:
  catch    switch* list with    case Nil: -1    case One: (exit 2 (field 0 list))    case Cons: (exit 2 (field 0 list))   with (2 x) x

Moreover, by the semantics of pattern-matching, cuts after or-patterns can also be avoided in many situations. In the case of one column matrices, where the expanded or-patterns express the full matching performed, all cuts can be avoided. Things get a bit more complicated when matrices have more than one column. Consider the following clause matrix,
PL =

(1|2) p2  →  l1
(3|4) q2  →  l2

We further assume a match on (x  y) and that match failure should result in (exit 1) (the static exception label corresponding to match failure can be given as a third argument to the compilation scheme). Writing p1 = (1|2) and q1 = (3|4), there are obviously no value vectors (v1  v2) such that v1 is an instance of both p1 and q1. As a consequence, the following compilation is correct:
  catch    (catch    (switch x with    case 1: (exit 2) case 2: (exit 2)    case 3: (exit 3) case 4: (exit 3)    default: (exit 1))    with (2) C((y), (
p2  →  l1
), 1))   with (3) C((y), (
q2  →  l2
), 1)
Intuitively, once x is checked, the choice between first and second row is made. Depending on the value of y, matching may still fail, but then, the whole matching fails.

Conversely, matrix division cannot be avoided when matching by p1 does not exclude matching by q1, that is, when p1 and q1 are compatible. This is the case, for instance, when p1 = (1|2) and q1 = (2|3). Then, a correct compilation is:
  catch    (catch    (switch x with    case 1: (exit 2) case 2: (exit 2)    default: (exit 3))    with (2) C((y), (
p2  →  l1
), 3))   with (3)    (catch    (switch x with    case 2: (exit 4) case 3: (exit 4)    default: (exit 1))    with (4) C((y), (
q2  →  l2
), 1))
Note that the third argument to the first recursive call to the compilation scheme is “3” and not “1”. As a consequence, vectors (2  v2) such that p2 does not match v2 while q2 matches v2 get mapped correctly to l2. A slight innefiency shows up, since x is tested twice. More striking, perhaps, vectors (1  v2) such that p2 does not match v2 also lead to testing x twice.

An alternative compilation rule for or-pattern would simply expand or-patterns in a pre-processing phase, yielding the matrix:




1 p2  →  l1
2 p2  →  l1
2 q2  →  l2
3 q2  →  l2




Then, there are no extra run-time tests on x, since the constructor rule applies. However, patterns p2 and q2 are now compiled twice. Note that there is no simple solution for avoiding this duplication of effort, since, once the constructor rule is applied, the two occurences of these patterns occur in different contexts. More generally, code size is now out of control, a clear contradiction with the spirit of bactracking automata.


Previous Up Next