Printing programs
Camlp5 provides extensions kits to pretty print programs in revised syntax and normal syntax. Some other extensions kits also allow to rebuild the parsers, or the EXTEND statements in their initial syntax. The pretty print system is itself extensible, by adding new rules. We present here how it works in the Camlp5 sources.
The pretty print system of Camlp5 uses the library modules Pretty, an original system to format output) and Extfun, another original system of extensible functions.
This documentation is destinated to programmers who want to understand how the pretty printing of ocaml programs work in camlp5, want to adapt, modify or debug it, or want to add their own pretty printing extensions.
Introduction
The files doing the pretty prints are located in Camlp5 sources in the directory "etc". Look at them if you are interested on creating new ones. The main ones are:
- "etc/pr_r.ml": pretty print in revised syntax.
- "etc/pr_o.ml": pretty print in normal syntax.
- "etc/pr_rp.ml": rebuilding parsers in their original revised syntax.
- "etc/pr_op.ml": rebuilding parsers in their original normal syntax.
- "etc/pr_extend.ml": rebuilding EXTEND in its original syntax.
We present here how this system work inside these files.
Principles
Using module Pretty
All functions in ocaml pretty printing take a parameter named "the
printing context" (variable pc
). It is a record holding :
- The current indendation :
pc.ind
- What has to be printed before, in the same line :
pc.bef
- What has to be printed after, in the same line :
pc.aft
- The dangling token, useful in normal syntax to know whether
parentheses are necessary :
pc.dang
A typical pretty printing function calls the
function horiz_vertic
of the library
module Pretty. This function takes two
functions as paramter:
- The way to print the data in one only line (horizontal printing)
- The way to print the data in two or more lines (vertical printing)
Both functions catenate the strings by using the
function sprintf
of the library
module Pretty
which controls whether the printed data
holds in the line or not. They generally call, recursively, other
pretty printing functions with the same behaviour.
Let us see an example (fictive) of printing an ocaml
application. Let us suppose we have an application expression
"e1 e2
" to pretty print where e1
and e2
are sub-expressions. If both expressions and their
application holds on one only line, we want to see:
e1 e2
On the other hand, if they do not hold on one only line, we want to
see e2
in another line with, say, an indendation of 2
spaces:
e1 e2
Here is a possible implementation. The function has been
named expr_app
and can call the function expr
to
print the sub-expressions e1
and e2
:
value expr_app pc e1 e2 = horiz_vertic (fun () -> let s1 = expr {(pc) with aft = ""} e1 in let s2 = expr {(pc) with bef = ""} e2 in sprintf "%s %s" s1 s2) (fun () -> let s1 = expr {(pc) with aft = ""} e1 in let s2 = expr {(pc) with ind = pc.ind + 2; bef = tab (pc.ind + 2)} e2 in sprintf "%s\n%s" s1 s2) ;
The first function is the horizontal printing. It ends with a
sprintf
separating the printing of e1
and e2
by a space. The possible "before part"
(pc.bef
) and "after part" (pc.aft
) are
transmitted in the calls of the sub-functions.
The second function is the vertical printing. It ends with a
sprintf
separating the printing of e1
and e2
by a newline. The second line starts with an
indendation, using the "before part" (pc.bef
) of the
second call to expr
.
The pretty printing library
function Pretty.horiz_vertic
calls the first
(horizontal) function, and if it fails (either
because s1
or s2
are too long or hold
newlines, or because the final string produced by sprintf
is too long), calls the second (vertical) function.
Notice that the parameter pc
contains a
field pc.bef
(what has to be printed before in the same
line), which in both cases is transmitted to the printing
of e1
(since the syntax {(pc) with aft = ""}
is a record with pc.bef
kept). Same for the
field pc.aft
transmitted to the printing
of e2
.
Using module Extfun and its syntax
This system is combined to the the extensible
functions to allow the extensibility of the pretty
printing. Pretty printers of camlp5 can then be used as "kits" to be
added or not, according to the things to be pretty printed in some or
other ways. In particular, the pretty printing kit
"pr_r.cmo
" alone does not rebuild parsers in their
original syntax. When adding "pr_rp.cmo
", the parsers are
rebuilt: the code of "pr_rp.ml
" is just an extension of
some parts of the pretty printing extensible functions of
"pr_r.ml
".
The code above actually looks like:
value expr_app = extfun Extfun.empty with [ <:expr< $e1$ $e2$ >> -> fun curr next pc -> horiz_vertic (fun () -> let s1 = curr {(pc) with aft = ""} e1 in let s2 = next {(pc) with bef = ""} e2 in sprintf "%s %s" s1 s2) (fun () -> let s1 = curr {(pc) with aft = ""} e1 in let s2 = next {(pc) with ind = pc.ind + 2; bef = tab (pc.ind + 2)} e2 in sprintf "%s\n%s" s1 s2) | e -> fun curr next pc -> next pc e ] ;
The extensible functions have a syntax tree
(here <:expr< $e1$ $e2$ >>
) as parameter. To
be extensible, the syntax tree must be the first parameter (it is not
possible to apply extensions inside a closure). The other parameters,
in particular the printing context
pc
are given in the semantic action.
The parameter curr
and next
are provided
by the pretty printing system for ocaml programs. They correspond to
the pretty printing of, respectively, the current level and the next
level. Since the application in ocaml is left associative, the first
sub-expression is printed at the same (current) level and the second
one is printed at the next level. We also see a call
to next
in the last (2nd) case of the function to treat
the other cases in the next level.
Dangling else, bar, semicolon
In normal syntax, there are cases where it is necessary to enclose
expressions between parentheses (or
between begin
and end, which is equivalent in that
syntax). Three tokens may cause problems: the "else
", the
vertical bar "|
" and the semicolon ";
". Here
are examples where the presence of these tokens constraints the
previous expression to be parenthesized. In these three examples,
removing
the begin..end
enclosers would change the meaning of the expression because the
dangling token would be included in that expression:
Dangling else:
if a then begin if b then c end else d
Dangling bar:
function A -> begin match a with B -> c | D -> e end | F -> g
Dangling semicolon:
if a then b else begin let c = d in e end; f
The information is transmitted by the
value pc.dang
. In the first example above, while
displaying the "then
" part of the outer
"if
", the sub-expression is called with the
value pc.dang
set to
"else"
to inform the last sub-sub-expression that it is
going to be followed by that token. When a "if
"
expression has to be displayed without "else
" part,
and that its "pc.dang
" is "else", it has to be enclosed
with spaces.
This problem does not exist in revised syntax. While pretty
printing in revised syntax, the parameter pc.dang
is not
necessary and remains the empty string.
By level
For each level of pretty printing, there is such a function. The
example showed the pretty printing of expression at the level "apply".
There are other functions for levels "top", "add", "mul", "simple",
and so on. The global pretty printing variable for expressions is a
record, named "pr_expr
" (in the
module Pcaml.Printers
), where the levels are defined by a
list, something like this:
pr_expr.pr_levels := [{pr_label = "top"; pr_rules = expr_top}; {pr_label = "add"; pr_rules = expr_add}; {pr_label = "mul"; pr_rules = expr_mul}; {pr_label = "apply"; pr_rules = expr_app}; {pr_label = "simple"; pr_rules = expr_simple}] ;
where we find, in particular, our function expr_app
defined above.
The call to a specific level is done by the
function pr_expr.pr_fun
with the level name. It returns the
function taking the "printing context" (pc
) and the expression
as parameters, and returning the pretty printed string. For example, the
call to the top level of expressions has been defined as:
value expr pc e = pr_expr.pr_fun "top" pc e;
Same thing for the other pretty printed functions for patterns, structures, signatures, and so on.
To extend some level, in another file, the
function find_pr_level
can be used to get the level to
be extended, e.g.
let expr_app = find_pr_level "app" pr_expr.pr_levels in expr_app.pr_rules := extfun expr_app.pr_rules with [ <:expr< .... >> -> ... | <:expr< .... >> -> ... | ... ];