We compare the performance of the code generated by the Objective-Caml
compilers version 3.00 and 3.01, where the former implements the
scheme of section 3.3 and the latter implements our new
optimizing scheme (there are other differences of
minor relevance to our purpose). For most programs there is little
difference; this is natural since pattern-matching usually accounts
for a small fraction of most programs running time. A full analysis
of the efficiency of our optimizations would in fact require counting
relevant instructions (test, branches and indirect branches through
jump tables), both statically and dynamically. By lack of time, we
only present some programs that demonstrate significant
improvement.
Our first benchmark is the traditionnal fib, that we
write using a or-pattern.
let rec fib n = match n with| (0|1) -> 1 | _ -> fib (n-1) + fib (n-2)
Here, we simply measure the execution time of computing fib 38.
Our second benchmark, pcf, is a byte-code compiler and interpreter
for PCF. We compute the geometric mean of the execution time for a set of
five different PCF programs.
The time-consuming part of this program is the byte-code machine which we
coded in the style of the byte-code machine included in
[14], the winning entry of the 2000 ICFP
programming contest.
(we also give figures for this program under the name raytrace).
Experiments were performed on a lightly loaded
366Mhz Pentium Pro Linux PC.
The tables show wall-clock times (in seconds) and ratios:
fib
raytrace
pcf
V 3.00
5.36
100
1.69
100
8.12
100
V 3.01
3.74
71
1.62
96
5.08
63
Obviously, as demonstrated by the fib example,
compilation of or-patterns has much improved.
Testing similar examples confirms that
fact. Improvements also comes from the better compilation of switches.
The pcf example is more interesting, it shows that
our optimizations yield a 37% speed-up, in the case of a typical
ML application (a quickly written, compact, prototype implementation of
some programming language).
The raytrace example exhibits
less important improvements on the whole test suite
of the contest; however,
improvements are noticeable for some inputs.
It should also be noticed that
the new compiler somehow
equates the runtime performance of various coding styles,
a feature that is important for a
high-level construct such as pattern-matching.
Variations in coding style include
the relative ordering of non-overlapping patterns and on the
order of arguments in pairs.
We also performed measurements on a 500Mhz Dec Alpha server.
They suggest that the effects of our
optimization do not depend on the targeted architecture.
fib
pcf
V 3.00
3.4
100
4.13
100
V 3.01
2.5
74
2.86
69
The raytrace example is is omitted because it relies on IEEE floating point arithmetic, which is
not implemented in the Objective Caml compiler for this architecture.