Previous Up Next

6  Our Compilation Scheme




Figure 3: Operations on contexts


(a) Context specialization
PQ row S(c, PQ) row
p1i pki c(q1i, … , qai) qni p1i pki c(_, … , _) q1i qai qa+1i qni
p1i pki _ qni p1i pki c(_, … , _) _ _ qa+1i qni
p1i pki c'(q1i, … , qai) qni      no row


(b) Context collection
PQ row COL(PQ) row
p1i pk−1i c(_, … , _) q1i qai qa+1i qni p1i pki c(q1i, … , qai) qni


(c) Context pushing and popping
PQ row ⇓(PQ) row ⇑(PQ) row
p1i pki q1i qni p1i pki q1i q2i qni p1i pk−1i pki q1i qni

The new scheme C* takes five arguments and a typical call is C*(x, PL, ex, def, ctx), where x = (x1xn) and PL is a clause matrix of width n:
PL =




p11 pn1  →  l1
p12 pn2  →  l2
p1m pnm  →  lm




Extra arguments are: The initial call to C* for an exhaustive match is:
C*((x),




  p1  →  l1
  p2  →  l2
  pm  →  lm




, total, ∅, (• _))
For a non-exhaustive match, ex is partial, def is the one-element sequence ((_), 1) and a trap handler is added as in section 3.3. The context argument remains the same: it expresses that nothing is known yet about the value of x.

The new scheme returns a lambda-code l and a jump summary, ρ = {…, ictx, …}, which is a mapping from trap numbers to contexts. Jump summaries describe what is known about matched values at the places where (exit i …) occur in l.

6.1  Operations on contexts

We define the following four operations on contexts :
  1. Context specialization, S, by a constructor c of arity a is defined by mapping the transformation of figure 3-(a) on context rows.

  2. Context collection, COL, is the reverse of specialization. It combines the the last element of the prefix with the appropriate number of arguments standing at beginning of the fringe (see figure 3-(b)).

  3. Context pushing ⇓ and popping ⇑ move the fringe limit one step forward and backward, without examining any pattern (see figure 3-(c)).
As contexts are used to represent set of values, we naturally define union and intersection over contexts. Context union PQP'•Q' yields a new matrix whose rows are the rows of PQ and P'•Q'. Row order is not relevant. Context intersection PQP'•Q' is defined as a context whose rows are the least upper bounds of the compatible rows of PQ and P'•Q'. Context extraction EX is a particular case of context intersection.
EX(p,P'•Q') = (_   …   _   •   p   …   _) ∩ P'•Q'
For example, when p is c(_, … , _), context extraction retains those value vectors represented by P'•Q' whose k+1th components admit c as head constructor. Observe that such a computation involves extracting or-pattern arguments and making wild-cards more precise.

Except for collection and popping, which consume prefix elements, all these operations can be extended to simple matrices, by using an empty prefix in input, and taking the fringe for output. Doing so, we obtain exactly the operations of section 3.3 used to compute pattern matrices (specialization S in particular).

Operations on contexts are extended to jump summaries in the natural manner. For instance, the union of ρ and ρ' is defined as:
ρ ∪ ρ' = {…, i ↦ ρ(i) ∪ ρ'(i),…}


Operations on matrices are extended to reachable trap handlers in a similar manner: for instance, pushing trap handlers is defined as pushing all matrices in them :
⇓((P1, e1) ; … ; (Pt, et)) = (⇓(P1), e1) ; … ; (⇓(Pt), et)


6.2  Compilation scheme

We now describe scheme C* by considering cases over the typical call.
  1. If n is zero. then we have:
    C*((),




         →  l1
         →  l2
         →  lm




    , ex, def, ctx) = l1, ∅
    Observe that the jump summary is empty since no exit is outputed.

  2. With respect to section 3.3, the variable rule only changes as regards the extra arguments ex, def and ctx. We only describe these changes. The performed recursive call returns code l and jump summary ρ :
    l,ρ = C*(…, …, ex, ⇓(def), ⇓(ctx))
    Exhaustiveness information ex does not change, while def and ctx are pushed.

    The variable rule returns l unchanged and ρ popped.

  3. In the constructor rule, let C = {c1, … ,ck} be the matched constructors, let also Σ be the signature of their type. For a given constructor cC, the performed recursive call is:
    C*(…, …, ex, S(c,def), S(c, ctx))
    Exhaustiveness information ex is passed unchanged, while the other two extra arguments are specialized (specialization of trap handlers being the natural extension of matrix specialization).

    Each recursive call returns a lambda-code l(c) and a jump summary ρc. Lambda-code l(c) gets wrapped into let-bindings like in section 3.3, yielding the final lambda-code r(c). We then define a case list L and a jump summary ρrec as follows:
    L = case c1: r(c1) ⋯ case ck: r(ck)
    ρrec = {  …, i
     
    cC
    COLc(i)), … }
    The case list is as before, while the jump summary is the union of the the jump summaries produced by recursive calls, once collected.

    Optimizations are then performed. For clarity, optimizations are described as a two phase process: first, extend (or not extend) the case list L with constructors taken from Σ\ C, and add (or not add) a default case; then, compute the final jump summary.

    A first easy case is when Σ\ C is empty or when ex is total. Then, the case list L is not augmented. Otherwise, we distinguish two cases :
    1. If Σ\ C is finite, then for all constructors c in this set we consider the context
      QcQ'c = EX(c(_, … , _),ctx)


      Then, trap handlers (P1, e1) ; … ; (Pt, et) are scanned left-to-right, stopping at the smallest i, such that the intersection Q'cPi is not empty. That is, we find the trap handler where to jump to when the head constructor of x1 is c, in order to extend the case list as follows :
      L = L case c: (exit ei)
      It is possible that ei does not exist (when Q'c is empty). This means that x1 head constructor will never be c at runtime.

    2. If Σ\ C is infinite (as in the case of integers) or considered too large (as it might be in the case of characters), then, a default case is added to the case list :
      L = L default: (exit e1)
      That is, all non-recognized constructors lead to a jump the nearest enclosing reachable trap-handler.

      However it is still possible to extend the case list for particular constructors, applying the previous procedure (a) to the constructors that appear in the first column of reachable trap handler matrices and not in C.

    The final jump summary is computed by considering the final case list L. For a given trap handler number ei let {c'1, … ,c'k'} be the set of constructors such that case c'j: (exit ei) appears in L. Then the jump summary ρei is defined as:
    ρei = { eiEX(c'1(_, … , _) ∣⋯ ∣ c'k'(_, … , _)), ctx) }
    Moreover, if there is a default clause, the jump summary ρd is defined as:
    ρd = {  e1ctx}
    Finally the constructor rule returns a switch on case list L and the jump summary built by performing the union of ρrec, of all ρei's and, when appropriate, of ρd.

    The constructor rule performs many context unions, so that contexts may become huge. Fortunately, contexts can be made smaller using a simple observation. Namely, let p and q be two rows in a context, such that p is less precise than q (i.e., all instances of q are instances of p). Then, row q can be removed from the context, without modifying its meaning as a set of value vectors. Hence, while performing context union, one can leave aside some pattern rows. If the produced context is still too large, then contexts are safely approximated by first replacing some patterns in them by wild-cards (typically all the pattern in a given column) and then removing rows using the previous remark. Rough experiments lead us to set the maximal admissible context size to 32 rows, yielding satisfactory compilation time in pathological examples and exact contexts in practical examples.

  4. Or-pattern compilation operates on matrices whose first column contains at least one or-pattern. Additionally, when p1i is a or-pattern, then for all j, i < jm one of the following, mutually exclusive, conditions must hold:
    1. p1i and p1j are not compatible.
    2. p1i and p1j are compatible, and (p2ipni) is less precise than (p2jpnj)
    Conditions (a) and (b) guarantee that, whenever p1i matches the first value vector v1 of a value v, but row i does not match v, then no further row in P matches v either. This is necessary since further rows of P won't be reachable in case of failure in the or-pattern trap handler.

    Now, consider one row number i, such that p1i is the or-pattern q1 ∣⋯ ∣qo. Further assume that this or-pattern binds the variables y1, … ,yv. First, we allocate a fresh trap number e and divide PL into the following or-body P' → L' and or-trap P'' → L'' clauses:
    P' → L' =








    p1i−1 pni−1  →  li−1
    q1 _  →  (exit e y1...yv)
    qo _  →  (exit e y1...yv)
    p1i+1 pni+1  →  lj+1








    P'' → L'' =

    p2i pmi  →  li

    In the or-body matrix, observe that the or-pattern is expanded, while the other patterns in row number i are replaced by wild-cards and the action is replaced by exits.

    Recursive calls are performed as follows:
    l',ρ' = C*(x, P' → L', ex, def, ctx)
    l'', ρ'' =
    C*(x2 ↔ n, P'' → L'', ex, ⇓(EX(p,def)), ⇓(EX(p,ctx)))


    Outputed code finally is catch l' with (e y1 ... yv) l'' and the returned jump summary is ρ = ρ' ∪ ⇑(ρ'').

  5. The mixture rule is responsible for feeding the other rules with appropriate clause matrices. We first consider the case of a random division. Hence let us cut PL into QM and RN at some row. Then a fresh trap number e is allocated and a first recursive call is performed:
    lq, ρq = C*(x, QM, partial, (R, e) ; def, ctx)
    The exhaustiveness information is partial, since nothing about the exhaustiveness of Q derives from the exhaustiveness of P. Reachable trap handlers are extended.

    Then, a second recursive call is performed:
    lr, ρr = C*(x, RN, ex, def, ρq(e))
    It is no surprise that the context argument to the new call is extracted from the jump summary of the previous call. Argument ex does not change. Indeed, if matching by P cannot fail, then matching by R neither can.

    Then, the scheme can output the code
    l = catch lq  with (e) lr
    and return the jump summary (ρq \ { e }) ∪ ρr, where ρq \ { e } stands for ρq with the binding for e removed.

    Of course, our optimizing compiler does not perform a random division into two matrices. It instead divides PL right away into several sub-matrices. This can be described formally as several, clever, applications of the random mixture rule, so that one of the three previous rules apply to each matrix in the division. The aim of the optimizing mixture rule is thus to perform a division of P into as few sub-matrices as possible. We present a simple, greedy, approach that scans P downwards.

    We only describe the case when p11 is a constructor pattern. Thus, having performed the classical mixture rule, we are in a situation where the i topmost rows of P have a constructor pattern in first position (i.e. are constructor rows for short) and where p1i+1 is not a constructor pattern. At that point, a matrix C has been built, which encompasses all the rows of P from 1 to i. Let us further write P' for what remains of P, and let O and R be two new, initially empty matrices. We then scan the rows of P' from top to bottom, appending them at the end of C, O or R. That is, given row number j in P':
    1. If p'1j is a variable, then append row j at the end of R.
    2. If p'1j is a constructor pattern, then...
      1. If row j is not compatible with all the rows of both R and O, then append row j at the end of C (i.e., move row j above all the rows that have been extracted from P' at previous stages).
      2. If row j is not compatible with all the rows of R and that one of conditions (a) or (b) for applying the or-pattern rule are met by O with row j appended at the end, then do such an append.
      3. Otherwise, append row j at the end of R.
    3. If p'1j is a or-pattern, then consider cases (ii) and (iii).
    When the scan of P' is over, three matrices, C, O and R have been built. In the case where O is empty, matrix C is valid input to the constructor rule; otherwise, appending the rows of O at the end of C yields valid input for applying (maybe more than once) the or-pattern rule, which will in turn yield valid input to the constructor rule (provided that (_ |...) or patterns have been replaced by semantically equivalent wild-cards in a previous phase). Thus, the matrix built by appending O at the end of C is recorded into the overall division and the division process is restarted with input R, unless R is empty.

    Finally, the full process divides the input matrix P into several matrices, each of which is valid input to the other rules of the compilation scheme.

Previous Up Next