6 Our Compilation Scheme

6 Our Compilation Scheme

Figure 3: Operations on contexts

(a) Context specialization

P•Q row S(c, P•Q) row

p₁ⁱ ⋯ p_kⁱ • c(q₁ⁱ, … , q_aⁱ) ⋯ q_nⁱ p₁ⁱ ⋯ p_kⁱ c(_, … , _) • q₁ⁱ ⋯ q_aⁱ q_a+1ⁱ ⋯ q_nⁱ

p₁ⁱ ⋯ p_kⁱ • _ ⋯ q_nⁱ p₁ⁱ ⋯ p_kⁱ c(_, … , _) • _ ⋯ _ q_a+1ⁱ ⋯ q_nⁱ

p₁ⁱ ⋯ p_kⁱ • c'(q₁ⁱ, … , q_aⁱ) ⋯ q_nⁱ no row

(b) Context collection

P•Q row COL(P•Q) row

p₁ⁱ ⋯ p_k−1ⁱ c(_, … , _) • q₁ⁱ ⋯ q_aⁱ q_a+1ⁱ ⋯ q_nⁱ p₁ⁱ ⋯ p_kⁱ • c(q₁ⁱ, … , q_aⁱ) ⋯ q_nⁱ

(c) Context pushing and popping

P•Q row ⇓(P•Q) row ⇑(P•Q) row

p₁ⁱ ⋯ p_kⁱ • q₁ⁱ ⋯ q_nⁱ p₁ⁱ ⋯ p_kⁱ q₁ⁱ • q₂ⁱ ⋯ q_nⁱ p₁ⁱ ⋯ p_k−1ⁱ • p_kⁱ q₁ⁱ ⋯ q_nⁱ

The new scheme C^* takes five arguments and a typical call is C^*(x, P → L, ex, def, ctx), where x = (x₁… x_n) and P → L is a clause matrix of width n:

P → L =

⎛
⎜
⎜
⎜
⎝

p₁¹	⋯	p_n¹	→	l¹
p₁²	⋯	p_n²	→	l²
⋮
p₁^m	⋯	p_n^m	→	l^m

⎞
⎟
⎟
⎟
⎠

Extra arguments are:

The exhaustiveness argument ex is either partial or total depending on whether compilation can produce escaping exit constructs or not.
Reachable trap handlers def are sequences (P₁, e₁); ⋯; (P_t, e_t), where the e_i's are integers (trap handler numbers) and the P_i's are pattern matrices of width n.
The context ctx is a pattern matrix of width k+n, equivalent to a pair of matrixes P•Q, where each row is divided into a prefix (in P) of width k and a fringe (in Q) of width n.

P•Q =

⎛
⎜
⎜
⎜
⎝

p₁¹ ⋯ p_k¹ • q₁¹ ⋯ q_n¹

p₁² ⋯ p_k² • q₁² ⋯ q_n²

⋮

p₁^m ⋯ p_k^m • q₁^m ⋯ q_n^m
⎞
⎟
⎟
⎟
⎠

Informally, at any point in compilation, contexts are pre-order representations of what is known about matched values. The fringe records the possible values for x, while the prefix records the same information for other subterms which are relevant to pending calls to C^*. Transfers of patterns from fringe to prefix are performed on the arguments of recursive calls, while transfers in the opposite direction are performed as results are collected.

The initial call to C^* for an exhaustive match is:

C^*((x),

⎛
⎜
⎜
⎜
⎝

	p¹	→	l¹
	p²	→	l²
⋮
	p^m	→	l^m

⎞
⎟
⎟
⎟
⎠

, total, ∅, (• _))

For a non-exhaustive match, ex is partial, def is the one-element sequence ((_), 1) and a trap handler is added as in section 3.3. The context argument remains the same: it expresses that nothing is known yet about the value of x.

The new scheme returns a lambda-code l and a jump summary, ρ = {…, i ↦ ctx, …}, which is a mapping from trap numbers to contexts. Jump summaries describe what is known about matched values at the places where (exit i …) occur in l.

6.1 Operations on contexts

We define the following four operations on contexts :

Context specialization, S, by a constructor c of arity a is defined by mapping the transformation of figure 3-(a) on context rows.
Context collection, COL, is the reverse of specialization. It combines the the last element of the prefix with the appropriate number of arguments standing at beginning of the fringe (see figure 3-(b)).
Context pushing ⇓ and popping ⇑ move the fringe limit one step forward and backward, without examining any pattern (see figure 3-(c)).

As contexts are used to represent set of values, we naturally define union and intersection over contexts. Context union P•Q ∪ P'•Q' yields a new matrix whose rows are the rows of P•Q and P'•Q'. Row order is not relevant. Context intersection P•Q ∩ P'•Q' is defined as a context whose rows are the least upper bounds of the compatible rows of P•Q and P'•Q'. Context extraction EX is a particular case of context intersection.

EX(p,P'•Q') = (_ … _ • p … _) ∩ P'•Q'

For example, when p is c(_, … , _), context extraction retains those value vectors represented by P'•Q' whose k+1^th components admit c as head constructor. Observe that such a computation involves extracting or-pattern arguments and making wild-cards more precise.

Except for collection and popping, which consume prefix elements, all these operations can be extended to simple matrices, by using an empty prefix in input, and taking the fringe for output. Doing so, we obtain exactly the operations of section 3.3 used to compute pattern matrices (specialization S in particular).

Operations on contexts are extended to jump summaries in the natural manner. For instance, the union of ρ and ρ' is defined as:

ρ ∪ ρ' = {…, i ↦ ρ(i) ∪ ρ'(i),…}

Operations on matrices are extended to reachable trap handlers in a similar manner: for instance, pushing trap handlers is defined as pushing all matrices in them :

⇓((P₁, e₁) ; … ; (P_t, e_t)) = (⇓(P₁), e₁) ; … ; (⇓(P_t), e_t)

6.2 Compilation scheme

We now describe scheme C^* by considering cases over the typical call.

If n is zero. then we have:

C^*((),

⎛
⎜
⎜
⎜
⎝

→ l¹

→ l²

⋮

→ l^m
⎞
⎟
⎟
⎟
⎠
, ex, def, ctx) = l¹, ∅
Observe that the jump summary is empty since no exit is outputed.
With respect to section 3.3, the variable rule only changes as regards the extra arguments ex, def and ctx. We only describe these changes. The performed recursive call returns code l and jump summary ρ :
l,ρ = C^*(…, …, ex, ⇓(def), ⇓(ctx))
Exhaustiveness information ex does not change, while def and ctx are pushed.

The variable rule returns l unchanged and ρ popped.
In the constructor rule, let C = {c₁, … ,c_k} be the matched constructors, let also Σ be the signature of their type. For a given constructor c ∈ C, the performed recursive call is:
C^*(…, …, ex, S(c,def), S(c, ctx))
Exhaustiveness information ex is passed unchanged, while the other two extra arguments are specialized (specialization of trap handlers being the natural extension of matrix specialization).

Each recursive call returns a lambda-code l(c) and a jump summary ρ_c. Lambda-code l(c) gets wrapped into let-bindings like in section 3.3, yielding the final lambda-code r(c). We then define a case list L and a jump summary ρ_rec as follows:

L = case c₁: r(c₁) ⋯ case c_k: r(c_k)

ρ_rec = { …, i ↦

∪

c ∈ C
COL(ρ_c(i)), … }

The case list is as before, while the jump summary is the union of the the jump summaries produced by recursive calls, once collected.

Optimizations are then performed. For clarity, optimizations are described as a two phase process: first, extend (or not extend) the case list L with constructors taken from Σ\ C, and add (or not add) a default case; then, compute the final jump summary.

A first easy case is when Σ\ C is empty or when ex is total. Then, the case list L is not augmented. Otherwise, we distinguish two cases :
1. If Σ\ C is finite, then for all constructors c in this set we consider the context
  Q_c•Q'_c = EX(c(_, … , _),ctx)
  
  Then, trap handlers (P₁, e₁) ; … ; (P_t, e_t) are scanned left-to-right, stopping at the smallest i, such that the intersection Q'_c ∩ P_i is not empty. That is, we find the trap handler where to jump to when the head constructor of x₁ is c, in order to extend the case list as follows :
  L = L case c: (exit e_i)
  It is possible that e_i does not exist (when Q'_c is empty). This means that x₁ head constructor will never be c at runtime.
2. If Σ\ C is infinite (as in the case of integers) or considered too large (as it might be in the case of characters), then, a default case is added to the case list :
  L = L default: (exit e₁)
  That is, all non-recognized constructors lead to a jump the nearest enclosing reachable trap-handler.
  
  However it is still possible to extend the case list for particular constructors, applying the previous procedure (a) to the constructors that appear in the first column of reachable trap handler matrices and not in C.
The final jump summary is computed by considering the final case list L. For a given trap handler number e_i let {c'₁, … ,c'_k'} be the set of constructors such that case c'_j: (exit e_i) appears in L. Then the jump summary ρ_{e_i} is defined as:
ρ_{e_i} = { e_i ↦ EX(c'₁(_, … , _) ∣⋯ ∣ c'_k'(_, … , _)), ctx) }
Moreover, if there is a default clause, the jump summary ρ_d is defined as:
ρ_d = { e₁ ↦ ctx}
Finally the constructor rule returns a switch on case list L and the jump summary built by performing the union of ρ_rec, of all ρ_{e_i}'s and, when appropriate, of ρ_d.

The constructor rule performs many context unions, so that contexts may become huge. Fortunately, contexts can be made smaller using a simple observation. Namely, let p and q be two rows in a context, such that p is less precise than q (i.e., all instances of q are instances of p). Then, row q can be removed from the context, without modifying its meaning as a set of value vectors. Hence, while performing context union, one can leave aside some pattern rows. If the produced context is still too large, then contexts are safely approximated by first replacing some patterns in them by wild-cards (typically all the pattern in a given column) and then removing rows using the previous remark. Rough experiments lead us to set the maximal admissible context size to 32 rows, yielding satisfactory compilation time in pathological examples and exact contexts in practical examples.

Or-pattern compilation operates on matrices whose first column contains at least one or-pattern. Additionally, when p₁ⁱ is a or-pattern, then for all j, i < j ≤ m one of the following, mutually exclusive, conditions must hold:

p₁ⁱ and p₁^j are not compatible.
p₁ⁱ and p₁^j are compatible, and (p₂ⁱ… p_nⁱ) is less precise than (p₂^j… p_n^j)

Conditions (a) and (b) guarantee that, whenever p₁ⁱ matches the first value vector v₁ of a value v, but row i does not match v, then no further row in P matches v either. This is necessary since further rows of P won't be reachable in case of failure in the or-pattern trap handler.

Now, consider one row number i, such that p₁ⁱ is the or-pattern q₁ ∣⋯ ∣q_o. Further assume that this or-pattern binds the variables y₁, … ,y_v. First, we allocate a fresh trap number e and divide P → L into the following or-body P' → L' and or-trap P'' → L'' clauses:

P' → L' =

⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝

⋮
p₁ⁱ⁻¹	…	p_nⁱ⁻¹	→	lⁱ⁻¹
q₁	…	_	→	`(exit` e y₁...y_v`)`
⋮
q_o	…	_	→	`(exit` e y₁...y_v`)`
p₁ⁱ⁺¹	…	p_nⁱ⁺¹	→	l^j+1
⋮

⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠

P'' → L'' =

⎛
⎝

p₂ⁱ

…

p_mⁱ

→

lⁱ

⎞
⎠

In the or-body matrix, observe that the or-pattern is expanded, while the other patterns in row number i are replaced by wild-cards and the action is replaced by exits.

Recursive calls are performed as follows:

l',ρ'	=	C^(x, P' → L', ex, def, ctx*)
l'', ρ''	=	…
… C^(x_{2 ↔ n}, P'' → L'', ex, ⇓(EX(p,def)), ⇓(EX(p,ctx*)))

Outputed code finally is catch l' with (e y₁ ... y_v) l'' and the returned jump summary is ρ = ρ' ∪ ⇑(ρ'').

The mixture rule is responsible for feeding the other rules with appropriate clause matrices. We first consider the case of a random division. Hence let us cut P → L into Q → M and R → N at some row. Then a fresh trap number e is allocated and a first recursive call is performed:
l_q, ρ_q = C^*(x, Q → M, partial, (R, e) ; def, ctx)
The exhaustiveness information is partial, since nothing about the exhaustiveness of Q derives from the exhaustiveness of P. Reachable trap handlers are extended.

Then, a second recursive call is performed:
l_r, ρ_r = C^*(x, R → N, ex, def, ρ_q(e))
It is no surprise that the context argument to the new call is extracted from the jump summary of the previous call. Argument ex does not change. Indeed, if matching by P cannot fail, then matching by R neither can.

Then, the scheme can output the code
l = catch l_q with (e) l_r
and return the jump summary (ρ_q \ { e }) ∪ ρ_r, where ρ_q \ { e } stands for ρ_q with the binding for e removed.

Of course, our optimizing compiler does not perform a random division into two matrices. It instead divides P → L right away into several sub-matrices. This can be described formally as several, clever, applications of the random mixture rule, so that one of the three previous rules apply to each matrix in the division. The aim of the optimizing mixture rule is thus to perform a division of P into as few sub-matrices as possible. We present a simple, greedy, approach that scans P downwards.

We only describe the case when p₁¹ is a constructor pattern. Thus, having performed the classical mixture rule, we are in a situation where the i topmost rows of P have a constructor pattern in first position (i.e. are constructor rows for short) and where p₁ⁱ⁺¹ is not a constructor pattern. At that point, a matrix C has been built, which encompasses all the rows of P from 1 to i. Let us further write P' for what remains of P, and let O and R be two new, initially empty matrices. We then scan the rows of P' from top to bottom, appending them at the end of C, O or R. That is, given row number j in P':
1. If p'₁^j is a variable, then append row j at the end of R.
2. If p'₁^j is a constructor pattern, then...
  1. If row j is not compatible with all the rows of both R and O, then append row j at the end of C (i.e., move row j above all the rows that have been extracted from P' at previous stages).
  2. If row j is not compatible with all the rows of R and that one of conditions (a) or (b) for applying the or-pattern rule are met by O with row j appended at the end, then do such an append.
  3. Otherwise, append row j at the end of R.
3. If p'₁^j is a or-pattern, then consider cases (ii) and (iii).
When the scan of P' is over, three matrices, C, O and R have been built. In the case where O is empty, matrix C is valid input to the constructor rule; otherwise, appending the rows of O at the end of C yields valid input for applying (maybe more than once) the or-pattern rule, which will in turn yield valid input to the constructor rule (provided that (_ |...) or patterns have been replaced by semantically equivalent wild-cards in a previous phase). Thus, the matrix built by appending O at the end of C is recorded into the overall division and the division process is restarted with input R, unless R is empty.

Finally, the full process divides the input matrix P into several matrices, each of which is valid input to the other rules of the compilation scheme.