In this final (?) post in my series on Camlp4, I want at last to cover syntax extensions. A nontrivial syntax extension involves almost all the topics we have previously covered, so it seems fitting that we treat them last.
Extending grammarsIn the post on parsing we covered Camlp4 grammars but stopped short of explaining how to extend them. Well, this is not completely true: we used the EXTEND
form to extend an empty grammar, and we can also use it to extend non-empty grammars. We saw a small example of this when implementing quotations, where we extended the JSON grammar with a new json_eoi
entry (which refered to an entry in the original grammar). Rules and levels may also be added to existing entries, and rules may be deleted.
Let’s look at a complete syntax extension, which demonstrates modifying Camlp4’s OCaml grammar. The purpose of the extension is to change the precedence of the method call operator #
to make “method chaining” read better. For example, if the foo
method returns an object, you can write
obj#foo "bar" #baz
to call the baz
method, rather than needing
(obj#foo "bar")#baz
(I originally wrote this for use with the jQuery
binding for ocamljs
; method chaining is common with jQuery
.)
Here is the extension:
open Camlp4
module Id : Sig.Id =
struct
let name = "pa_jquery"
let version = "0.1"
end
module Make (Syntax : Sig.Camlp4Syntax) =
struct
open Sig
include Syntax
DELETE_RULE Gram expr: SELF; "#"; label END;
EXTEND Gram
expr: BEFORE "apply"
[ "#" LEFTA
[ e = SELF; "#"; lab = label ->
<:expr< $e$ # $lab$ >> ]
];
END
end
module M = Register.OCamlSyntaxExtension(Id)(Make)
To make sense of a syntax extension it’s helpful to refer to Camlp4OCamlRevisedParser.ml
(which defines the revised syntax grammar) and Camlp4OCamlParser.ml
(which defines the original syntax as an extension of the revised syntax). There we see that the #
operator is parsed in the expr
entry, in a level called ”.
” (which includes other dereferencing operators), and that this level appears below the apply
level, which parses function application. Recall from the parsing post that operators in lower levels bind more tightly. So to get the effect we want, we need to move the #
rule above the apply
level in the grammar.
First we delete the rule from its original location: DELETE_RULE
takes the grammar, the entry, and the symbols on the left-hand side of the rule, followed by END
; you don’t have to say in what level it appears. Then we add the rule at a new location: we create a new level #
containing the rule from the original grammar, and add it before the level named apply
.
There are several ways to specify where a level is inserted: BEFORE
level and AFTER
level put it before or after some other level; LEVEL
level adds rules to an existing level (you will be warned but not stopped from changing the label or associativity of the level); FIRST
and LAST
put the level before or after all other levels. If you don’t specify, rules are added to the topmost level in the entry. The resulting grammar works just as if you had given it all at once, making the insertions in the specified places. (However, it is not very clear from the code how ordering works when inserting rules into an existing level; it is perhaps best not to rely on the order of rules in a level anyway.)
Finally we register the extension. The Make
argument to OCamlSyntaxExtension
returns a Sig.Camlp4Syntax
for some reason (in Register.ml
it is just ignored) so we include Syntax
to provide it.
(The complete code for this example is here.)
Transforming the ASTLet’s do a slightly more complicated example involving some transformation of the parsed AST. It often comes up that we want to let
-bind the value of an expression to a name, trapping exceptions, then evaluate the body of the let
outside the scope of the exception handler. This is a bit painful to write in stock OCaml; we can only straightforwardly express trapping exceptions in the whole let
expression:
try let x = e1 in e2
with e -> h
A nice alternative is to use thunks to delay the evaluation of the body, doing it outside the scope of the try
/with
:
(try let x = e1 in fun () -> e2
with e -> fun () -> h)()
(We must thunkify the exception handler to make the types work out.) This is simple enough to do by hand, but let’s give it some syntactic sugar:
let try x = e1 in e2
with e -> h
which should expand to the thunkified version above. (The idea and syntax are taken from Martin Jambon’s micmatch extension.)
Let’s look at the existing rules in Camlp4OCamlRevisedParser.ml
for let
and try
to get an idea of how to parse the let
/try
form:
[ "let"; r = opt_rec; bi = binding; "in"; x = SELF ->
<:expr< let $rec:r$ $bi$ in $x$ >>
...
| "try"; e = sequence; "with"; a = match_case ->
<:expr< try $mksequence' _loc e$ with [ $a$ ] >>
For let
, the opt_rec
entry parses an optional rec
keyword (we see there is a special antiquotation for interpolating rec
). Binding
parses a group of bindings separated by and
. SELF
is just expr
. For try
, sequence
is a sequence of expressions separated by ;
, and match_case
is a group of match cases separated by |
. (These entries are both a little different in the original syntax, to account for the different semicolon rules and the []
delimiters around the match cases.) Recall that Camlp4OCamlRevisedParser.ml
uses the revised syntax quotations, so we have []
around the match cases. The call to mksequence'
just wraps a do {}
around a sequence if necessary; more on this below.
The parsing rule we want is a combination of these. Here is the extension:
EXTEND Gram
expr: LEVEL "top" [
[ "let"; "try"; r = opt_rec; bi = binding; "in";
e = sequence; "with"; a = match_case ->
let a =
List.map
(function
| <:match_case< $p$ when $w$ -> $e$ >> ->
<:match_case<
$p$ when $w$ -> fun () -> $e$
>>
| mc -> mc)
(Ast.list_of_match_case a []) in
<:expr<
(try let $rec:r$ $bi$ in fun () -> do { $e$ }
with [ $list:a$ ])()
>>
]
];
END
We put rec
after try
(following micmatch
), which is a little weird , but if we put it before we would need to look ahead to
disambiguate `let` from `let try`; once we parse `opt_rec` we are
committed to one rule or the other ; instead we could start the rule "let"; r = opt_rec; "try"
, which has no ambiguity with the ordinary let
rule because the "let"; opt_rec
prefix is factored out; the parser doesn’t choose between the rules until it tries to parse try
. After in
we parse sequence
rather than SELF
; this seems like a good choice because there is a with
to end the sequence.
Now, to transform the AST, we map over the match cases. The match_case
entry returns a list of cases separated by Ast.McOr
; we call list_of_match_case
to get an ordinary list. For each case, we match the pattern, when
clause, and expression on the right-hand side (these are packaged in an Ast.McArr
, where the when
clause field is Ast.ExNil
if there is no when
clause), and return it with the expression thunkified. Then we return the whole let
inside try
, with the body sequence thunkified.
We have to add a do {}
around the body, creating an Ast.ExSeq
node, because that’s what is expected by Camlp4Ast2OCamlAst.ml
—recall from the filters post that the Camlp4 AST is translated to an OCaml AST and marshalled to the compiler. If we forget this (and “we” often forget these idiosyncrasies) then we get the error ”expr; expr: not allowed here, use do {...} or [|...|] to surround them
”, which is pretty helpful as these errors go.
(The complete code for this example is here.)
Extending pattern matchingAs a final example, let’s extend OCaml’s pattern syntax. In the quotations post we noted that JSON quotations in a pattern are not very useful, because we would usually like a pattern to match even if the fields of an object come in a different order or there are extra fields. To keep the code short let’s abstract the problem a little and consider matching association lists: if we write a match case
| alist [ "foo", x; "bar", y ] -> e
we would like it to match association lists with "foo"
and "bar"
keys, in any order, with any extra pairs in the list. Our translation looks like this:
| __pa_alist_patt_1 when
(match ((try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None),
(try Some (List.assoc "bar" __pa_alist_patt_1)
with | Not_found -> None))
with
| (Some x, Some y) -> true
| _ -> false)
->
(match ((try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None),
(try Some (List.assoc "bar" __pa_alist_patt_1)
with | Not_found -> None))
with
| (Some x, Some y) -> e
| _ -> assert false)
This might seem overcomplicated, and it is true that we could simplify it for this case. But the built-in pattern syntax is complicated, and it is tricky handling all the cases to make things work smoothly; the strategy that produces the code above will handle some (but not all) of the complications. (We’ll consider some improvements below.)
The basic idea is that when we come to an alist
we replace it with a new fresh name, then do further matching in a when
clause, so if it fails we can continue to the next case by returning false
. In the when
clause we look up the keys, putting them in option
s, then match on the option
s; we handle nested patterns (to the right of a key) by embedding them in the when
clause match. The when
clause match also binds variables appearing in the original pattern, so they are available to the when
clause of the original case (if there is one). Finally, we do the whole thing over again in the match case body to provide bindings to the original body.
In order to implement this we’ll use both a syntax extension and a filter. The reason is that we’d like to extend the patt
entry, but to do the AST transformation we sketched above we need to transform match_case
s. We could replace the match_case
part of the parser as well but that seems needlessly hairy, and generally when writing a syntax extension we’d like to touch as little of the parser as possible so it interoperates well with other extensions.
First, here is the syntax extension:
EXTEND Gram
patt: LEVEL "simple"
[[
"alist"; "[";
l =
LIST0
[ e = expr LEVEL "simple"; ",";
p = patt LEVEL "simple" ->
Ast.PaOlbi (_loc, "", p, e) ]
SEP ";";
"]" ->
<:patt< $uid:"alist"$ $Ast.paSem_of_list l$ >>
]];
END
We extend the simple
level of the patt
entry, which parses primitive patterns. Inside alist []
we parse a list of expr
/ patt
pairs; we parse expr
at the simple
level or else it would parse the whole pair as an expr
, and the same for patt
just in case. Then we return the pair of expression and pattern in an Ast.PaOlbi
(ordinarily used for optional argument defaults in function definitions). Why? Well, we need to return something of type patt
, but we need somehow to get the expr
to our filter, and this is the only patt
constructor that holds an expr
. (As an alternative we could parse a patt
instead of an expr
, but then we’d need to translate it to an expr
at the point we use it.) Finally we return the list wrapped in a data constructor so we can recognize it easily in the filter; because it is lower-case, we can be sure that “alist” is not the identifier of a real data constructor.
Now let’s look at the filter. First, some helper functions:
let fresh =
let id = ref 0 in
fun () ->
incr id;
"__pa_alist_patt_" ^ string_of_int !id
let expr_tup_of_list _loc = function
| [] -> <:expr< () >>
| [ v ] -> v
| vs -> <:expr< $tup:Ast.exCom_of_list vs$ >>
let patt_tup_of_list _loc = function
| [] -> <:patt< () >>
| [ p ] -> p
| ps -> <:patt< $tup:Ast.paCom_of_list ps$ >>
We have a function to generate fresh names, a function to turn a list of expressions into a tuple, and a similar function for patterns. The reason we need these latter two is that a tuple with 0 or 1 elements is not allowed by Camlp4Ast2OCamlAst.ml
(the empty “tuple” is actually a special identifier in the Camlp4 AST). Next, the main rewrite function:
let rewrite _loc p w e =
let k = ref (fun s f -> s) in
The function takes the parts of an Ast.McArr
(that is, a match case). We’re going to map over the pattern p
, building up a function k
as we encounter nested alist
forms. We want to generate the same matching code in the when
clause and the body, so k
is parameterized with an expression in case of success (the original when
clause or the body) and in case of failure (false
or assert
false
). We will build k
from the inside out, starting with a function that just returns the success expression.
let map =
object
inherit Ast.map as super
method patt p =
match super#patt p with
| <:patt< $uid:"alist"$ $l$ >> ->
let id = fresh () in
let l =
List.map
(function
| Ast.PaOlbi (_, _, p, e) -> p, e
| _ -> assert false)
(Ast.list_of_patt l []) in
let vs =
List.map
(fun (_, e) ->
<:expr<
try Some (List.assoc $e$ $lid:id$)
with Not_found -> None
>>)
l in
let ps =
List.map
(fun (p, _) -> <:patt< Some $p$ >>)
l in
let k' = !k in
k :=
(fun s f ->
<:expr<
match $expr_tup_of_list _loc vs$ with
| $patt_tup_of_list _loc ps$ -> $k' s f$
| _ -> $f$
>>);
<:patt< $lid:id$ >>
| p -> p
end in
The Ast.map
object provides methods to map each syntactic class of the AST, along with default implementations which return the node unchanged. We extend it to walk over the pattern, leaving it unchanged except when we come to our special alist
constructor. In that case we generate a fresh name, which we return as the value of the function. Then we extract the expr
/ patt
pairs and map them to try Some (List.assoc ...
expressions and Some
patterns. Finally we extend k
by matching all the expressions against all the patterns; if the match succeeds we call the previous k
, otherwise the failure expression. Since we build k
from the inside out, we transform subpatterns first (by matching over super#patt p
).
let p = map#patt p in
let w = match w with <:expr< >> -> <:expr< true >> | _ -> w in
let w = !k w <:expr< false >> in
let e = !k e <:expr< assert false >> in
<:match_case< $p$ when $w$ -> $e$ >>
We call map#patt
on p
to replace special alist
constructor nodes with fresh names and build up k
, then call the resulting k
on the when
clause (if there is no when
clause we replace it with true
) and body, and finally return the result as a match_case
, completing the rewrite function.
let filter =
let map =
object
inherit Ast.map as super
method match_case mc =
match super#match_case mc with
| <:match_case@_loc< $p$ when $w$ -> $e$ >> ->
rewrite _loc p w e
| e -> e
end in
map#str_item
let _ = AstFilters.register_str_item_filter filter
We extend Ast.map
again to call the rewrite function on each match_case
, then register the resulting filter.
The code above handles when
clauses and nested alist
patterns, and interacts properly with ordinary OCaml patterns. However, it completely falls down on nested pattern alternatives. If we write
match x with
| alist [ "foo", f ]
| alist [ "fooz", f ] -> e
we get this mess:
| __pa_alist_patt_1 | __pa_alist_patt_2 when
(match try Some (List.assoc "fooz" __pa_alist_patt_2)
with | Not_found -> None
with
| Some f ->
(match try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None
with
| Some f -> true
| _ -> false)
| _ -> false)
->
... (* the same mess for the body *)
The first problem is that both arms of an alternative must bind the same names, but we have replaced them with two different fresh names. The second problem is that we have blindly treated alternative alist
patterns as being nested one inside the other. A solution to both these problems is to split nested alternatives into separate cases, at the cost of duplicating the when
clause and body in each.
Jeremy Yallop’s patterns framework (see here for an update that works with OCaml 3.12.0) allows multiple pattern extensions to coexist, and provides some common facilities to make them easier to write. In particular it splits nested alternatives into separate cases. Another deficiency in the code above is that it duplicates the match expression (built in k
) in the when
clause and body. This can be avoided by computing the body within the when
clause, setting a reference, and dereferencing it in the body. However, the reference must be bound outside the match_case
to be visible both in the when
clause and the body, so this approach must transform each AST node that contains match_case
s in order to bind the refs in the right place. The patterns
framework handles this as well.
(The complete code for this example is here. A version using the patterns
framework is here.)