Purely functional parsers of streams
Purely functional parsers are an alternative
of parsers of streams where the used stream
type is a lazy non-destructive type : these streams are lazy values,
like in classical parsers of streams, but the values are not removed
as long as the parsing advances. To make them work, the parsers of
purely functional streams return, not the simple values, but a value
of type option : "Node
" meaning "mo match" (the equivalent
of the exception "Parse.Failure
" of normal streams) and
"Some (r, s)
" meaning "the result is r and the remaining
stream is s".
Syntax
The syntax of purely functional parsers, when loading "pa_fstream.cmo", is the following:
expression ::= fparser | match_with_fparser fparser ::= "fparser" pos_opt "[" parser_cases "]" | "fparser" pos_opt parser_case match_with_fparser ::= "match" expression with "fparser" pos_opt "[" parser_cases "]" | "match" expression with "fparser" pos_opt parser_case parser_cases ::= parser_cases parser_case | <nothing> parser_case ::= "[:" stream_pattern ":]" pos_opt "->" expression | "[:" ":]" pos_opt "->" expression stream_pattern ::= stream_patt_comp | stream_patt_comp ";" stream_pattern stream_patt_comp ::= "`" pattern | pattern "=" expression | pattern pos_opt ::= pattern | <nothing>
Notice that, in difference with classical parsers, there is no difference, in a stream pattern, between the first stream pattern component and the other ones. In particular, there is not this syntax "question mark" and expression optionnally ending those components. Moreover, the "lookahead" case is not necessary, we see further why. On the contrary, the syntaxes "pattern when" and "let..in" inside stream patterns we see in classical parsers are just not implemented.
Streams
The functional parsers are functions taking as parameters
functional streams, which are values of type "Fstream.t
a
" for some type "a
". It is possible to build
functional streams using the functions defined in the module
"Fstream
":
Fstream.from
"Fstream.from f
" returns a stream built from the
function "f
". To create a new stream element, the
function "f
" is called with the current stream count,
starting with zero. The user function "f
" must return
either "Some <value>
" for a value or
"None
" to specify the end of the stream.
Fstream.of_list
Return a stream built from the list in the same order.
Fstream.of_string
Return a stream of the characters of the string parameter.
Fstream.of_channel
Return a stream of the characters read from the input channel parameter.
Semantics of parsers
Fparser
The purely functional parsers act like classical parsers, with a recursive descent algorithm, except that:
- If the first stream pattern component matches the beginning of the stream, there is no error if the following stream patterns components do not match: the control simply passes to the next parser case with the initial stream.
- If the semantic actions are of type "
t
", the result of the parser is of type "option (t * Fstream.t)
", not just "t
" like in classical parsers. If a stream pattern matches, the semantic action is evaluated, giving some result "e
" and the result of the parser is "Some (e, strm)
" where "strm
" is the remaining stream. - If no parser case matches, the result of the parser is
"
None
".
Error position
A difficulty, with purely functional parsers, is how to find the
position of the syntax error, when the input is wrong. Since the
system tries all parsers cases before returning "None
",
and that the initial stream is not affected, it is not possible to
directly find where the error happened. This is a problem for parsing
using backtracking (here, it is limited backtracking, but the problem
is the same).
The solution is to use the function
"Fstream.count_unfrozen
" applied to the initial
stream. Like its name says, it returns the number of unfrozen elements
of the stream, which is exactly the longuest match found. If the input
is a stream of characters, the return of this function is exactly the
position in number of characters from the beginning of the stream.
However, it is not possible to know directly which rule failed and therefore it is not possible, like in classical parsers, to specify and get clear error messages. Future versions of purely functional parsers may propose solutions to resolve this problem.
Notice that, if using that method, it is not possible to reuse the same stream to call another parser, and hope to get the right position of the error, if another error happens, since it may test less terminals than the first parser. Use a fresh stream in this case, if possible.