Parsing and Printing tools
Camlp5 provides two original parsing tools:
- stream parsers
- extensible grammars
The first parsing tool, the stream parsers, is the elementary system. It is pure syntactic sugar, i.e. the code is directly converted into basic ocaml statements: functions, pattern matchings, try. A stream parser is a function. But the system does not take care of associativity, nor parsing level, and left recursion result on infinite loops, just like functions whose first action would be a call to itself.
The second parsing tool, the extensible grammars, are more sophisticated. A grammar written with them is more readable, and look like grammars written with tools like "yacc". They take care of associativity, left recursion, and level of parsing. They are dynamically extensible, what allows the syntax extensions what camlp5 provides for ocaml syntax.
In both cases, the input data are streams.
Camlp5 also provides a pretty printing tool, a module allowing to control the lines length.
The next sections give an overview of the two parsing and the pretty tools.
Stream parsers
The stream parsers are a system of elementary recursive descendant parsing. Streams are actually lazy lists. At each step, the head of the list is compared against a stream pattern. There are two kinds of streams parsers:
- The imperative streams parsers, where
the elements are removed from the stream as long as they are parsed.
Parsers return either:
- A value, in case of success,
- The exception "Stream.Failure" when the parser does not apply and no elements have been removed from the stream, indicating that, possibly, other parsers may apply,
- The exception "Stream.Error" when the parser does not apply, but one or several elements have been removed from the stream, indicating that nothing can to be done to make up the error.
- The purely functional stream parsers
where the elements are not removed from the stream during the parsing.
These parsers return a value of type "option", i.e either:
- "Some" a value and the remaining stream, in case of success,
- "None", in case of failure.
The differences are about:
- Syntax errors: in the imperative version, the location of the error is clear, it is at the current position of the stream, and the system allows to provide a specific error message (typically, that some "element" was "expected"). On the other hand, in the functional version, the position is not clear since it returns with nothing more than the initial stream. The only solution to know where the error happened is to analyze that stream to see how many elements have be unfrozen, and no clear error message is available, just "syntax error".
- Power: in the imperative version, when a rule raises the exception "Stream.Error", the parsing cannot continue. In the functional version, the parsing can continue by analyzing the next rule with the initial unaffected stream: this is a limited backtrack.
- Neatness: functional streams are neater, just like functional programming is neater than imperative programming.
In the imperative version, there exists also lexers, a shorter syntax when the stream elements are of the specific type 'char'.
Extensible grammars
Extensible grammars manipulate grammar entries. Grammar entries are abstract values internally containing mutable stream parsers. When a grammar entry is created, its internal parser is empty, i.e. it raises "Parse.Failure" if used. A specific syntactic construction, with the keyword "EXTEND" allow to extend grammar entries with new grammar rules.
In opposition to stream parsers, grammar entries take care of associativity, left factorization, and levels. Moreover, the syntax for grammars allows to define optional calls, lists and lists with separators. However, they are not functions and cannot take parameters.
Since the internal system is stream parsers, extensible grammars use recursive descendant parsing.
The parser of the ocaml language in camlp5 is written with extensible grammars.
Pretty module
The "Pretty" module is an original tool allowing to control the displaying of lines. The user has to specify two functions where:
- the data is printed in one only line
- the data is printed in several lines
The system first tries the first function. At any time, it the line overflows, i.e. if its size is greater than some "line length" specified in the module interface, or if it contains newlines, the function is aborted and control is given to the second function.
This is a basic, but powerful, system. It supposes that the programmer takes care of the current indentation, and the beginning and the end of its lines.
The module will be extended in the future to hide the management of indendations and line continuations, and by the supply of functions combinating the two cases above, in which the programmer can specify the possible places where newlines can be inserted.