Quotations

Quotations are a syntax extension in Camlp5 to build expressions and patterns in any syntax independant from the one of OCaml. Quotations are expanded, i.e. transformed, at parse time to produce normal syntax trees, like the rest of the program. Quotations expanders are normal OCaml functions writable by any programmer.

The aim of quotations is to use concrete syntax for manipulating abstract values. That makes programs easier to write, read, modify, and understand. The drawback is that quotations are linguistically isolated from the rest of the program, in opposition to syntax extensions, which are included in the language.

  1. Introduction
  2. Quotation expander
  3. Defining a quotation
  4. Antiquotations
  5. Locations in quotations and antiquotations
  6. Located errors
  7. The Quotation module
  8. Predefined quotations
  9. A full example: lambda terms

Introduction

A quotation is syntactically enclosed by specific quotes formed by less (<) and greater (>) signs. Namely:

Examples:

  << \x.x x >>
  <:foo< hello, world >>
  <:bar< @#$%;* >>

The text between these particular parentheses can be any text. It may contain enclosing quotations and the characters "<", ">" and "\" can be escaped by "\". Notice that possible double-quote, parentheses, OCaml comments do not necessarily have to balanced inside them.

As far as the lexer is concerned, a quotation is just a kind of string.

Quotation expander

Quotations are treated at parse time. Each quotation name is associated with a quotation expander, a function transforming the content of the quotation into a syntax tree. There are actually two expanding functions, depending on whether the quotation is in the context of an expression or if it is in the context of a pattern.

If a quotation has no associated quotation expander, a parsing error is displayed and the compilation fails.

The quotation expander, or, rather, expanders, are functions taking the quotation string as parameter and returning a syntax tree. There is no constraint about which parsing technology is used. It can be stream parsers, extensible grammars, string scanning, ocamllex and yacc, any.

To build syntax trees, Camlp5 provides a way to easily build them see the chapter about them: it is possible to build abstract syntax through concrete syntax using, precisely... quotations.

Defining a quotation

By syntax tree

To define a quotation, it is necessary to program the quotation expanders and to, finally, end the source code with a call to:

  Quotation.add name (Quotation.ExAst (f_expr, f_patt));

where "name" is the name of the quotation, and "f_expr" and "f_patt" the respective quotations expanders for expressions and patterns.

After compilation of the source file (without linking, i.e. using option "-c" of the OCaml compiler), an object file is created (ending with ".cmo"), which can be used as syntax extension kit of Camlp5.

By string

There is another way to program the expander: a single function which returns, not a syntax tree, but a string which is parsed, afterwards, by the system. This function takes a boolean as first parameter telling whether the quotation is in position of expression (True) or in position of a pattern (False).

Used that way, the source file must end with:

  Quotation.add name (Quotation.ExStr f);

where "f" is that quotation expander. The advantage of this second approach is that it is simple to understand and use. The drawback is that there is no way to specify different source locations for different parts of the quotation (what may be important in semantic error messages).

Default quotation

It is possible to use some quotation without its name. Use for that the variable "Quotation.default_quotation". For example, ending a file by:

  Quotation.add "foo" (Quotation.ExAst (f_expr, f_patt));
  Quotation.default.val := "foo";

allows to use the quotation "foo" without its name, i.e.:

  << ... >>

instead of:

  <:foo< ... >>

If several files set the variable "Quotation.default", the default quotation applies to the last loaded one.

Antiquotations

A quotation obeys its own rules of lexing and parsing. Its result is a syntax tree, of type "Pcaml.expr" if the quotation is in the context of an expression, or "Pcaml.patt" if the quotation is in the context of a pattern.

It can be interesting to insert portions of expressions or patterns of the enclosing context in its own quotations. For that, the syntax of the quotation must define a syntax for antiquotations areas. It can be, for example:

In quotations, the locations in the resulting syntax tree are all set to the location of the quotation itself (if this resulting tree contains locations, they are overwritten with this location). Consequently, if there are semantic (typing) errors, the OCaml compiler will underline the entire quotation.

But in antiquotations, since they can be inserted in the resulting syntax tree, it is interesting to keep their initial quotations. For that, the nodes:

  <:expr< $anti:...$ >>
  <:patt< $anti:...$ >>

equivalent to:

  MLast.ExAnt loc ...
  MLast.PaAnt loc ...

are provided (see syntax tree quotations).

Let us take an example, without this node, and with this specific node.

Let us consider an elementary quotation system whose contents is just an antiquotation. This is just a school example, since the quotations brackets are not necessary, in this case. But we are going to see how the source code is underlined in errors messages.

Example without antiquotation node

The code for this quotation is (file "qa.ml"):

  #load "q_MLast.cmo";
  let expr s = Grammar.Entry.parse Pcaml.expr (Stream.of_string s) in
  Quotation.add "a" (Quotation.ExAst (expr, fun []));

The quotation expander "expr" just takes the string parameter (the contents of the quotation), and returns the result of the expression parser of the OCaml language.

Compilation:

  ocamlc -pp camlp5r -I +camlp5 -c qa.ml

Let us test it in the toplevel with a voluntary typing error:

  $ ocaml -I +camlp5 camlp5r.cma
          Objective Caml version ...

          Camlp5 Parsing version ...

  # #load "qa.cmo";
  # let x = "abc" and y = 25 in <:a< x ^ y >>;
  Characters 28-41:
    let x = "abc" and y = 25 in <:a< x ^ y >>;
                                ^^^^^^^^^^^^^
  This expression has type int but is here used with type string

We observe that the full quotation is underlined, although it concerns only the variable "y".

Example with antiquotation node

Let us consider this second version (file "qb.ml"):

  #load "q_MLast.cmo";
  let expr s =
    let ast = Grammar.Entry.parse Pcaml.expr (Stream.of_string s) in
    let loc = Ploc.make 1 0 (0, String.length s) in
    <:expr< $anti:ast$ >>
  in
  Quotation.add "b" (Quotation.ExAst (expr, fun []));

As above, the quotation expander "expr" takes the string parameter (the contents of the quotation) and applies the expression parser of the OCaml language. Its result, instead of being returned as it is, is enclosed with the antiquotation node. (The location built is the location of the whole string.)

Compilation:

  ocamlc -pp camlp5r -I +camlp5 -c qb.ml

Now the same test gives:

  $ ocaml -I +camlp5 camlp5r.cma
          Objective Caml version ...

          Camlp5 Parsing version ...

  # #load "qb.cmo";
  # let x = "abc" and y = 25 in <:b< x ^ y >>;
  Characters 37-38:
    let x = "abc" and y = 25 in <:b< x ^ y >>;
                                         ^
  This expression has type int but is here used with type string

Notice that, now, only the variable "y" is underlined.

In conclusion

In the resulting tree of the quotation expander:

Locations in quotations and antiquotations

This section explains in details the problem of locations in quotations and antiquotations. It is designed for programmers of quotation expanders.

Locations are the difficult point in quotations and antiquotations. If they are not set correctly, error messages may highlight wrong parts of the source.

The locations are controlled:

If the quotation expander never uses them, all syntax and typing errors highlight the whole quotation.

Remark that in extensible grammars, syntax errors are automatically enclosed by the exception "Ploc.Exc". Therefore, if the quotation is parsed by an extensible grammar entry, this exception can be raised.

In the syntax tree nodes "<:expr< $anti:...$ >>" and "<:meta< $anti:...$ >>", the location is indicated by the implicit variable named "loc". Their usage is therefore something like:

  let loc = ...computation of the location... in
  <:expr< $anti:...$ >>

In the quotation

All locations must be computed relatively to the quotation string. The quotation string is the string between "<<" or "<:name<" and ">>" (excluded), the first character of this string being at location zero.

The quotation system automatically shifts all locations with the location of the quotation: the programmer of the quotation expander does not therefore need to care about where the quotation appears in the source.

In antiquotations

In antiquotations, it is important to control how the antiquotation string is parsed. For example, if the function parsing the antiquotation string raises "Ploc.Exc", the location of this exception must be shifted with the location of the antiquotation in the quotation.

For example, let us suppose that the source contains:

  << abc^ijk^(xyz) >>

where the antiquotation is specified between the caret ("^") characters. The antiquotation string is "ijk". It can be built in the quotation expander by:

  <:expr< ijk >>

If used just like this, without the "<:expr< $anti:x$ >>", in case of typing error (for example if the variable "ijk" is unbound), the OCaml error message will be:

  << abc^ijk^(xyz) >>
  ^^^^^^^^^^^^^^^^^^^
  Unbound value ijk

To put a location to "ijk", since its location in the quotation is "(5, 8)" (the "i" being the 5th characater of the quotation string, starting at zero), the quotation expander can build it like this:

  let e = <:expr< ijk >> in
  let loc = Ploc.make_unlined (5, 8) in
  <:expr< $anti:e$ >>

In this case, the possible typing error message will be:

  << abc^ijk^(xyz) >>
         ^^^
  Unbound value ijk

This case is simple, since the antiquotation is just an identifier, and there is no parser computing it.

If the antiquotation has to be parsed, for example if it is a complicated expression, there are two points to care about:

First, the syntax error messages. If the parser of the antiquotation raises "Ploc.Exc", its location is relative to the antiquotation. It must therefore be shifted to correspond to a location in the quotation. If "f" is the parsing function and "sh" the shift of the antiquotation in the quotation (whose value is "5" in the example), the code must be something like:

  try f () with
  [ Ploc.Exc loc exc -> Ploc.raise (Ploc.shift sh loc) exc ]

Second, the typing error messages. Here, the above code with "<:expr< $anti:e$ >>" can apply to the resulting tree.

The complete code, taking the possible syntax error messages and the possible typing error messages into account, can be (where "len" is the antiquotation length):

  let e =
    try f () with
    [ Ploc.Exc loc exc -> Ploc.raise (Ploc.shift sh loc) exc ]
  in
  let loc = Ploc.make_unlined (sh, sh + len) in
  <:expr< $anti:e$ >>

Located errors

If the quotation expander raises an exception, by default, the whole quotation is underlined:

  $ cat foo.ml
  #load "q_MLast.cmo";
  let expr s = raise (Failure "hello") in
  Quotation.add "a" (Quotation.ExAst (expr, fun []));

  $ ocaml -I +camlp5 camlp5r.cma
          Objective Caml version ...

          Camlp5 Parsing version ...

  # #use "foo.ml";
  - : unit = ()
  # <:a< good morning >>;
  Toplevel input:
  # <:a< good morning >>;
    ^^^^^^^^^^^^^^^^^^^^
  While expanding quotation "a":
  Failure: hello

To specify a location of the exception, use the function "Ploc.raise" instead of "raise". In this example, let us suppose that we want only the characters 5 to 7 are underlined. This can be done like this:

  $ cat foo.ml
  #load "q_MLast.cmo";
  let expr s = Ploc.raise (Ploc.make 1 0 (5, 7)) (Failure "hello") in
  Quotation.add "a" (Quotation.ExAst (expr, fun []));

  $ ledit ocaml -I +camlp5 camlp5r.cma
          Objective Caml version ...

          Camlp5 Parsing version ...

  # #use "foo.ml";
  - : unit = ()
  # <:a< good morning >>;
  Toplevel input:
  # <:a< good morning >>;
             ^^
  While expanding quotation "a":
  Failure: hello

The Quotation module

type expander =
  [ ExStr of bool -> string -> string
  | ExAst of (string -> MLast.expr * string -> MLast.patt) ]
;
Is the type for quotation expander kinds:
  • "Quotation.ExStr exp" corresponds to an expander "exp" returning a string which is parsed by the system to create a syntax tree. Its boolean parameter tells whether the quotation is in position of an expression (True) or in position of a pattern (False). Quotations expanders created this way may work for some particular OCaml syntax, and not for another one (e.g. may work when used with revised syntax and not when used with normal syntax, and conversely).
  • "Quotation.ExAst (expr_exp, patt_exp)" corresponds to expanders returning syntax trees, therefore not necessitating to be parsed afterwards. The function "expr_exp" is called when the quotation is in position of an expression, and "patt_exp" when the quotation is in position of a pattern. Quotation expanders created this way are independent from the enclosing syntax.
value add : string -> expander -> unit;
"Quotation.add name exp" adds the quotation "name" associated with the expander "exp".
value find : string -> expander;
"Quotation.find name" returns the quotation expander of the given name.
value default : ref string;
The name of the default quotation : it is possible to use this quotation between "<<" and ">>" without having to specify its name.
value translate : ref (string -> string);
Function translating quotation names; default = identity. Used in the predefined quotation "q_phony.cmo". See below.

Some other useful functions for quotations are defined in the module "Pcaml". See the chapter "The Pcaml module", section "Quotation management".

Predefined quotations

q_MLast.cmo

This extension kit add quotations of OCaml syntax tree, allowing to use concrete syntax to represent abstract syntax. See the chapter Syntax tree.

q_ast.cmo

As with the previous quotation, this extension kit add quotations of OCaml syntax tree, but in the current user syntax with all extensions, the previous one being restricted to revised syntax without extension. See the chapters Syntax tree and Syntax tree quotations in user syntax.

q_phony.cmo

This extension kit is designed for pretty printing and must be loaded after a language pretty printing kit (in normal or in revised syntax). It prevents the expansions of quotations, transforming them into variables. The pretty printing then keeps the initial (source) form.

The macros (extension "pa_macro.cmo") are also displayed in their initial form, instead of expanded.

A full example: lambda terms

This example allows to represent lambda terms by a concrete syntax and to be able to combine them using antiquotations.

A lambda term is defined like this:

  type term =
    [ Lam of string and term
    | App of term and term
    | Var of string ]
  ;

Examples:

  value fst = Lam "x" (Lam "y" (Var "x"));
  value snd = Lam "x" (Lam "y" (Var "y"));
  value delta = Lam "x" (App (Var "x") (Var "x"));
  value omega = App delta delta;
  value comb_s =
    Lam "x"
      (Lamb "y"
         (Lamb "z"
            (App (App (Var "x") (Var "y")) (App (Var "x") (Var "z")))));

Since combinations of lambda term may be complicated, The idea is to represent them by quotations in concrete syntax. We want to be able to write the examples above like this:

  value fst = << \x.\y.x >>;
  value snd = << \x.\y.y >>;
  value delta = << \x.x x >>
  value omega = << ^delta ^delta >>;
  value comb_s = << \x.\y.\z.(x y)(x z) >>;

which is a classic representation of lambda terms.

Notice, in the definition of "omega", the use of the caret ("^") sign to specify antiquotations. Notice also the simplicity of the representation of the expression defining "comb_s".

Here is the code of the quotation expander, term.ml. The expander uses the extensible grammars. It has its own lexer (using the stream lexers) because the lexer of OCaml programs ("Plexer.gmake ()"), cannot recognize the backslashes alone.

Lexer

  (* lexer *)

  #load "pa_lexer.cmo";

  value rec ident =
    lexer
    [ [ 'a'-'z' | 'A'-'Z' | '0'-'9' | '-' | '_' | '\128'-'\255' ]
      ident!
    | ]
  ;

  value empty _ = parser [: _ = Stream.empty :] -> [];

  value rec next_tok =
    lexer
    [ "\\" -> ("BSLASH", "")
    | "^" -> ("CARET", "")
    | 'a'-'z' ident! -> ("IDENT", $buf)
    | "(" -> ("", "(")
    | ")" -> ("", ")")
    | "." -> ("", ".")
    | empty -> ("EOS", "")
    | -> raise (Stream.Error "lexing error: bad character") ]
  ;

  value rec skip_spaces =
    lexer
    [ " "/ skip_spaces!
    | "\n"/ skip_spaces!
    | "\r"/ skip_spaces! | ]
  ;

  value record_loc loct i (bp, ep) = do {
    if i >= Array.length loct.val then do {
      let newt =
        Array.init (2 * Array.length loct.val + 1)
          (fun i ->
             if i < Array.length loct.val then loct.val.(i)
             else Ploc.dummy)
      in
      loct.val := newt;
    }
    else ();
    loct.val.(i) := Ploc.make_unlined (bp, ep)
  };

  value lex_func cs =
    let loct = ref [| |] in
    let ts =
      Stream.from
        (fun i -> do {
           ignore (skip_spaces $empty cs : Plexing.Lexbuf.t);
           let bp = Stream.count cs in
           let r = next_tok $empty cs in
           let ep = Stream.count cs in
           record_loc loct i (bp, ep);
           Some r
         })
    in
    let locf i =
      if i < Array.length loct.val then loct.val.(i) else Ploc.dummy
    in
    (ts, locf)
  ;

  value lam_lex =
    {Plexing.tok_func = lex_func;
     Plexing.tok_using _ = (); Plexing.tok_removing _ = ();
     Plexing.tok_match = Plexing.default_match;
     Plexing.tok_text = Plexing.lexer_text;
     Plexing.tok_comm = None}
  ;

Parser

  (* parser *)

  #load "pa_extend.cmo";
  #load "q_MLast.cmo";

  value g = Grammar.gcreate lam_lex;
  value expr_term_eos = Grammar.Entry.create g "term";
  value patt_term_eos = Grammar.Entry.create g "term";

  EXTEND
    GLOBAL: expr_term_eos patt_term_eos;
    expr_term_eos:
      [ [ x = expr_term; EOS -> x ] ]
    ;
    expr_term:
      [ [ BSLASH; i = IDENT; "."; t = SELF -> <:expr< Lam $str:i$ $t$ >> ]
      | [ x = SELF; y = SELF -> <:expr< App $x$ $y$ >> ]
      | [ i = IDENT -> <:expr< Var $str:i$ >>
        | CARET; r = expr_antiquot -> r
        | "("; t = SELF; ")" -> t ] ]
    ;
    expr_antiquot:
      [ [ i = IDENT ->
           let r =
             let loc = Ploc.make_unlined (0, String.length i) in
             <:expr< $lid:i$ >>
           in
           <:expr< $anti:r$ >> ] ]
    ;
    patt_term_eos:
      [ [ x = patt_term; EOS -> x ] ]
    ;
    patt_term:
      [ [ BSLASH; i = IDENT; "."; t = SELF -> <:patt< Lam $str:i$ $t$ >> ]
      | [ x = SELF; y = SELF -> <:patt< App $x$ $y$ >> ]
      | [ i = IDENT -> <:patt< Var $str:i$ >>
        | CARET; r = patt_antiquot -> r
        | "("; t = SELF; ")" -> t ] ]
    ;
    patt_antiquot:
      [ [ i = IDENT ->
           let r =
             let loc = Ploc.make_unlined (0, String.length i) in
             <:patt< $lid:i$ >>
           in
           <:patt< $anti:r$ >> ] ]
    ;
  END;

  value expand_expr s =
    Grammar.Entry.parse expr_term_eos (Stream.of_string s)
  ;
  value expand_patt s =
    Grammar.Entry.parse patt_term_eos (Stream.of_string s)
  ;

  Quotation.add "term" (Quotation.ExAst (expand_expr, expand_patt));
  Quotation.default.val := "term";

Compilation and test

Compilation:

  ocamlc -pp camlp5r -I +camlp5 -c term.ml

Example, in the toplevel, including a semantic error, correctly underlined, thanks to the antiquotation nodes:

  $ ocaml -I +camlp5 camlp5r.cma
          Objective Caml version ...

          Camlp5 Parsing version ...

  # #load "term.cmo";
  # type term =
     [ Lam of string and term
     | App of term and term
     |   Var of string ]
    ;
  type term =
    [ Lam of string and term | App of term and term | Var of string ]
  # value comb_s = << \x.\y.\z.(x y)(x z) >>;
  value comb_s : term =
    Lam "x"
     (Lam "y"
       (Lam "z" (App (App (Var "x") (Var "y")) (App (Var "x") (Var "z")))))
  # value omega = << ^delta ^delta >>;
  Characters 18-23:
    value omega = << ^delta ^delta >>;
                      ^^^^^
  Unbound value delta
  # value delta = << \x.x x >>;
  value delta : term = Lam "x" (App (Var "x") (Var "x"))
  # value omega = << ^delta ^delta >>;
  value omega : term =
    App (Lam "x" (App (Var "x") (Var "x")))
      (Lam "x" (App (Var "x") (Var "x")))

Copyright 2007-2012 Daniel de Rauglaudre (INRIA)

Valid XHTML 1.1