4 Optimizations

4 Optimizations

catch (catch (switch lx with case []: 1 default: exit) with (catch (switch ly with case []: 2 default: exit) with (catch (switch lx with case (::): (switch ly with case (::) : 3 default: exit) default: exit)))) with (failwith "Partial match")

catch (catch (switch* lx with case []: 1 case (::) : (switch ly with case (::): 3 default: exit)) with (switch ly with case []: 2 default: exit) with (failwith "Partial match")

Figure 1: Mixture optimization

We now describe some improvement to the classical compilation scheme. For simplicity, we present examples and defer the full presentation of our scheme to section 6. In all these examples, we focus on pattern-matching compilation, replacing potentially arbitrary actions more simple ones, such as integers or variables.

4.1 Optimizing the mixture rule

In this section and in the following, our running example is the classical list-merge:

let merge lx ly = match lx,ly with | [], _ -> 1 | _, [] -> 2 | x::xs, y::ys -> 3

Such a matching on pairs encodes matching on two arguments. As a consequene, we consider the following initial call to scheme C:

C((lx ly),(P → L))

Where (P → L) is:

(P → L) =

⎛
⎜
⎜
⎝

`[]`	`_`	→	`1`
`_`	`[]`	→	`2`
`x::xs`	`y::ys`	→	`3`

⎞
⎟
⎟
⎠

Applying the mixture rule twice yields three matrices:

P₁ → L₁ =

⎛
⎝

[] _ → 1

⎞
⎠

P₂ → L₂ =

⎛
⎝

_ [] → 2

⎞
⎠

P₃ → L₃ =

⎛
⎝

x::xs y::ys → 3

⎞
⎠

Now, consider another clause matrix (P' → L'):

(P' → L') =

⎛
⎜
⎜
⎝

`[]`	`_`	→	`1`
`x::xs`	`y::ys`	→	`3`
`_`	`[]`	→	`2`

⎞
⎟
⎟
⎠

Both clause matrices define the same matching function, namely they both map ([] v) to 1, (v₁::v₂ []) to 2 and (v₁::v₂ v'₁::v'₂) to 3. Furthermore, (P' → L') can be obtained from (P → L) by swapping its second and third row. More generally, one easily checks that swapping two contiguous incompatible rows is legal. Then applying the mixture rule to (P' → L'), yields two matrices only:

P'₁ → L'₁ =

⎛
⎝

`[]`	`_`	→	`1`
`x::xs`	`y::ys`	→	`3`

⎞
⎠

P'₂ → L'₂ =

⎛
⎝

_ [] → 2

⎞
⎠

Final outputs for P → L and P' → L' are displayed on Figure 1. Hence, as a result of replacing P → L by P' → L', the two tests on lx that were performed separately on the left code are now merged in a single switch in the right code. Also notice that one trap disappears.

More generally, an optimized mixture rule should take advantage of pattern-matching semantics to swap rows when possible, so that as few cuts as possible are performed.

4.2 Using exhaustiveness information

The Objective Caml compiler checks the exhaustiveness of pattern matching expressions and issues a warning before compiling non-exhaustive pattern matchings. However, the exhaustiveness information can also be used for avoiding tests. Matrix P' of the previous section is exhaustive; this means that there will be no "Partial match" failure at runtime. As an immediate consequence, the switch: (switch ly with case []: 2 default: exit) always succeeds (this switch is the last one performed by the optimized code in figure 1). Thus, we replace it by 2. We can also suppress the outermost trap. Hence, applying both optimizations described up to now, compilation of P → L finally yields:

catch (switch* lx with case []: 1 case (::): (switch ly with case []: 3 default: exit)) with 2

In the general case, exhaustiveness information is exploited by slightly modifying scheme C. It suffices to avoid emitting default clauses in switch constructs, when it is known that no exit should escape from produced code. This property holds initially for exhaustive pattern matchings, and transmits to all recursive calls, except for the call on P₁ → L₁ in the mixture rule.

4.3 Optimizing exits

1 catch 2 (catch 3 (catch 4 (switch lx with 5 case Nil: 1 6 case Cons: 7 (switch ly with 8 case Cons: 5 9 default: exit) 10 11 default: exit) 12 with 13 (switch ly with 14 case Nil: 2 15 default: exit)) 16 with 17 (switch lx with 18 case One: 3 19 default: exit)) 20with 4

1 catch 2 (catch 3 (catch 4 (switch lx with 5 case Nil: 1 6 case Cons: 7 (switch* ly with 8 case Cons: 5 9 case Nil: (exit 2) 10 case One: (exit 4)) 11 default: (exit 2)) 12 with (2) 13 (switch ly with 14 case Nil: 2 15 default: (exit 3))) 16 with (3) 17 (switch lx with 18 case One: 3 19 default: (exit 4))) 20with (4) 4

Unoptimized code Optimized code

Figure 2: Exit optimization

The two previous optimizations yield optimal code for the merge example. Hence we complicate the running example by considering a matching on objects of type t from section 2:

P → L =

⎛
⎜
⎜
⎜
⎜
⎝

`Nil`	`_`	→	`1`
`_`	`Nil`	→	`2`
`One x`	`_`	→	`3`
`_`	`One y`	→	`4`
`Cons (x,xs)`	`Cons (y,ys)`	→	`5`

⎞
⎟
⎟
⎟
⎟
⎠

The optimized mixture rule yields four matrices:

P₁ → L₁ =

⎛
⎝

`Nil`	`_`	→	`1`
`Cons (x,xs)`	`Cons (y,ys)`	→	`5`

⎞
⎠

P₂ → L₂ =

⎛
⎝

_ Nil → 2

⎞
⎠

P₃ → L₃ =

⎛
⎝

One x _ → 3

⎞
⎠

P₄ → L₄ =

⎛
⎝

_ One y → 4

⎞
⎠

For reasons that will appear immediately, we apply the mixture rule from bottom to top, thereby nesting trap handlers. The match being exhaustive, compilation yields the code displayed on the left part of Figure 2.

Now, consider what happens at run-time when (lx ly) is (Cons (v₁, v₂) One v). A first switch on lx leads to line 7, where a switch on ly is performed. This switch fails, and the default action jumps to the nearest enclosing handler (line 13), where ly is tested against Nil resulting in another switch failure. Here, in our case, control goes to line 17, where another switch on lx (against One x) fails, resulting in final jump to line 20.

Hence, it would be appropriate to jump to line 20 right from the first test on ys. To do so, both exits and trap handlers are now labelled by integers. Note that this new feature does not really complicate the compilation of static exceptions. Then, it becomes possible to jump to different trap handlers from the same point and a better compilation of P → L is displayed in the right part of figure 2.

The code above maps vectors (Cons (v₁, v₂) One v) to 4 by executing two switches, while previous code needed four switches to perform the same task. Hence, exit optimization has a noticeable benefit as regards run-time efficiency. However, code size may be increased, since some switches may have larger case lists, but still remains under control, since no extra switches are generated. Hence, final code size critically depends on how switches translate to machine-level constructs. For instance, machine-level code size obviously does not increase when switches are translated to jump tables. Given the Objective Caml encoding of constructors, we are here in the same desirable situation where the compilation of apparently larger switches does not result in producing more code.

Surprisingly, performing exit optimization is quite simple and cheap: the needed information is available at compile-time by inspecting pattern matrices only. Reachable trap handlers are defined as pairs (P, e) of a pattern matrix and an integer. Reachable trap handlers originate from the division performed by the mixture rule. Here, P₁ → L₁ is compiled with the reachable trap-handlers (P₂, 2), (P₃, 3) and (P₄, 4). Then, the constructor rule specializes reachable trap handlers. Here, in the case where lx is Cons (v₁, v₂), specializing reachable trap handlers results in ((Nil),2) and ((One y), 4) (note that specializing P₃ yields an empty matrix, which is discarded). Hence, while generating the first switch on ly (line 7), it is known that the code produced by compiling trap handlers number 2 and 3 will surely exit when ly is One v, and a jump to trap handler number 4 can be generated by the compiler in that case.

4.4 Aggressive control flow optimization

The code produced by exit optimization still contains redundant tests, some of which can be removed without altering the handler structure introduced by the mixture rule. More specifically, we consider trap handler number 3 (line 16). It results from compiling P₃ and is a switch of lx against One.

The only (exit 3) lies in trap handler number 2 (line 15) and results from ly not being Nil, this gives us no direct information on lx. Now, looking upwards for (exit 2), we can infer that trap handler number 2 is entered from two different points. In the first case (line 9), (lx ly) is fully known as (Cons (v₁, v₂) Nil), in the second case (line 11), only lx is know to be One v. As (exit 3) on line 15 gets executed only when ly is not Nil, we can finally deduce that the first case never results in entering trap handler number 3. As a consequence, trap handler number 3 is executed in a context where lx necessarily is One v, the switch it performs is useless and line 16 can be simplified into “3”. This elimination of useless tests[4] is usually performed at a lower level by combining dead code elimination[9] and conditional constant propagation[21, 6].

Finally, after all optimizations, there remains one redundant switch in produced code, in trap-handler number 2 (line 12). As a result, vectors (Cons (v₁, v₂) Nil) are mapped to 2 by testing ly twice. One should notice that this is precisely the test that would get duplicated by compilation to decision trees.

Describing what is known on values while entering trap handlers is slightly involved. The key idea is representing set of value vectors as pattern matrices. We call such a set a context. Contexts for the three trap handlers of our example are:

Trap number Context

⎛
⎝

`One _`	_
`Cons (_, _)`	`Nil`

⎞
⎠

⎛
⎝

One _ (One _∣Cons (_, _))

⎞
⎠

⎛
⎝

Cons (_, _) One _

⎞
⎠

If precise enough and exploited fully, we conjecture that contexts subsume exhaustiveness information. However as intuition suggests and experience confirms, contexts get larger while compilation progresses, potentially reaching huge sizes at the end of matrices. We cure this by safely approximating contexts when they get too large, replacing some patterns in them by wild-cards. Hence the optimizations of section 4.2 is still worth considering, as being cheap and always applicable.