In this post I want to complicate the JSON quotation library from the previous post by adding antiquotations.
AST with antiquotationsIn order to support antiquotations we will need to make some changes to the AST. Here is the new AST type:
type t =
... (* base types same as before *)
| Jq_array of t
| Jq_object of t
| Jq_colon of t * t
| Jq_comma of t * t
| Jq_nil
| Jq_Ant of Loc.t * string
Let’s first consider Jq_Ant
. Antiquotations $tag:body$
are returned from the lexer as an ANTIQUOT
token containing the (possibly empty) tag and the entire body (including nested quotations/antiquotations) as a string. In the parser, we deal only with the JSON AST, so we can’t really do anything with an antiquotation but return it to the caller (wrapped in a Jq_Ant
).
The lifting functions generated by Camlp4MetaGenerator
treat Jq_Ant
(and any other constructor ending in Ant
) specially: instead of
| Jq_Ant (loc, s) ->
<:expr< Jq_Ant ($meta_loc loc$, $meta_string s$) >>
they have
| Jq_Ant (loc, s) -> ExAnt (loc, s)
Instead of lifting the constructor, they translate it directly to ExAnt
(or PaAnt
, depending on the context). We don’t otherwise have locations in our AST, but Jq_Ant
must take a Loc.t
argument because ExAnt
does. Later, when we walk the OCaml AST expanding antiquotations, it will be convenient to have them as ExAnt
nodes rather than lifted Jq_Ant
nodes.
In addition to Jq_Ant
, we have new Jq_nil
, Jq_comma
, and Jq_colon
constructors, and we have replaced the lists in Jq_array
and Jq_object
with just t
. The idea here is that in an antiquotation in an array, e.g.
<:json< [ 1, true, $x$, "foo" ] >>
we would like to be able to substitute any number of elements (including zero) into the array in place of x
. If Jq_array
took a list, we could substitute exactly one element only. So instead we build a tree out of Jq_comma
and Jq_nil
constructors; at any point in the tree we can substitute zero (Jq_nil
), one (any other t
constructor), or more than one (a Jq_comma
subtree) elements. We recover a list by taking the fringe of the final tree. (In the Jq_ast
module there are functions t_of_list
and list_of_t
which convert between these representations.) For objects, we use Jq_colon
to associate a name with a value, then build a tree of name/value pairs the same way.
While this AST meets the need, it is now possible to have ill-formed ASTs, e.g. a bare Jq_nil
, or a Jq_object
where the elements are not Jq_colon
pairs, or where the first argument of Jq_colon
is not a Jq_string
. This is annoying, but it is hard to see how to avoid it without complicating the AST and making it more difficult to use antiquotations.
Here is the updated parser:
EXTEND Gram
json: [[
... (* base types same as before *)
| `ANTIQUOT
(""|"bool"|"int"|"flo"|"str"|"list"|"alist" as n, s) ->
Jq_Ant (_loc, n ^ ":" ^ s)
| "["; es = SELF; "]" -> Jq_array es
| "{"; kvs = SELF; "}" -> Jq_object kvs
| e1 = SELF; ","; e2 = SELF -> Jq_comma (e1, e2)
| -> Jq_nil
| e1 = SELF; ":"; e2 = SELF -> Jq_colon (e1, e2)
]];
END
We want to support several kinds of antiquotations: For individual elements, $x$
(where x
is a t
), or $bool:x$
, $int:x$
, $flo:x$
, or $str:x$
(where x
is an OCaml bool
, int
, float
, or string
); for these latter cases we need to wrap x
in the appropriate t
constructor. For lists of elements, $list:x$
where x
is a t list
, and $alist:x$
where x
is a (string * t) list
; for these we need to convert x
to the Jq_comma
/ Jq_nil
representation above. But in the parser all we do is return a Jq_Ant
containing the tag and body of the ANTIQUOT
token. (We return it in a single string separated by :
because only one string argument is provided in ExAnt
.)
It is the parser which controls where antiquotations are allowed, by providing a case for ANTIQUOT
in a particular entry, and which tags are allowed in an entry. In this example we have only one entry, so we allow any supported antiquotation anywhere a JSON expression is allowed, but you can see in the OCaml parsers that the acceptable antiquotations can be context-sensitive, and the interpretation of the same antiquotation can vary according to the context (e.g. different conversions may be needed).
For arrays and objects, we parse SELF
in place of the list. The cases for Jq_comma
and Jq_nil
produce the tree representation, and the case for Jq_colon
allows name/value pairs. Recall that a token or keyword is preferred over the empty string, so the Jq_nil
case matches only when none of the others do. In particular, the quotation <:json< >>
parses to Jq_nil
.
We can see that not only is the AST rather free, but so is the parser: it will parse strings which are not well-formed JSON, like <:json< 1, 2 >>
or <json:< "foo" : true >>
. We lose safety, since a mistake may produce an ill-formed AST, but gain convenience, since we may want to substitute these fragments in antiquotations. As an alternative, we could have a more restrictive parser (e.g. no commas allowed at the json
entry), and provide different quotations for different contexts (e.g. <:json_list< >>
, allowing commas) for use with antiquotations. For this small language I think it is not worth it.
To expand antiquotations, we take a pass over the OCaml AST we got from lifting the JSON AST; look for ExAst
nodes; parse them as OCaml; then apply the appropriate conversion according to the antiquotation tag. To walk the AST we extend the Ast.map
object (generated with the Camlp4FoldGenerator
filter) so we don’t need a bunch of boilerplate cases which return the AST unchanged. Here’s the code:
module AQ = Syntax.AntiquotSyntax
let destruct_aq s =
let pos = String.index s ':' in
let len = String.length s in
let name = String.sub s 0 pos
and code = String.sub s (pos + 1) (len - pos - 1) in
name, code
let aq_expander =
object
inherit Ast.map as super
method expr =
function
| Ast.ExAnt (_loc, s) ->
let n, c = destruct_aq s in
let e = AQ.parse_expr _loc c in
begin match n with
| "bool" -> <:expr< Jq_ast.Jq_bool $e$ >>
| "int" ->
<:expr< Jq_ast.Jq_number (float_of_int $e$) >>
| "flo" -> <:expr< Jq_ast.Jq_number $e$ >>
| "str" -> <:expr< Jq_ast.Jq_string $e$ >>
| "list" -> <:expr< Jq_ast.t_of_list $e$ >>
| "alist" ->
<:expr<
Jq_ast.t_of_list
(List.map
(fun (k, v) ->
Jq_ast.Jq_colon (Jq_ast.Jq_string k, v))
$e$)
>>
| _ -> e
end
| e -> super#expr e
method patt =
function
| Ast.PaAnt (_loc, s) ->
let _, c = destruct_aq s in
AQ.parse_patt _loc c
| p -> super#patt p
end
When we find an antiquotation, we unpack the tag and contents (with destruct_aq
), parse it using the host syntax (given by Syntax.AntiquotSyntax
from Camlp4.PreCast
, which might be either the original or revised syntax depending which modules are loaded), then insert conversions depending on the tag. Conversions don’t make sense in a pattern context, so for patterns we just return the parsed antiquotation.
Finally we hook into the quotation machinery, mostly as before:
let parse_quot_string loc s =
let q = !Camlp4_config.antiquotations in
Camlp4_config.antiquotations := true;
let res = Jq_parser.Gram.parse_string json_eoi loc s in
Camlp4_config.antiquotations := q;
res
let expand_expr loc _ s =
let ast = parse_quot_string loc s in
let meta_ast = Jq_ast.MetaExpr.meta_t loc ast in
aq_expander#expr meta_ast
;;
Q.add "json" Q.DynAst.expr_tag expand_expr;
Before parsing a quotation we set a flag, which is checked by the lexer, to allow antiquotations; the flag is initially false, so antiquotations appearing outside a quotation won’t be parsed. After lifting the JSON AST to an OCaml AST, we run the result through the antiquotation expander.
For concreteness, let’s follow the life of a quotation as it is parsed and expanded. Say we begin with
<:json< [ 1, $int:x$ ] >>
After parsing:
Jq_array (Jq_comma (Jq_number 1., Jq_Ant (_loc, "int:x")))
After lifting:
<:expr<
Jq_array (Jq_comma (Jq_number 1., $ExAnt (_loc, "int:x")$))
>>
After expanding:
<:expr<
Jq_array (Jq_comma (Jq_number 1., Jq_number (float_of_int x)))
>>
Let’s see that again with a nested quotation:
<:json< $<:json< 1 >>$ >>
After parsing:
Jq_Ant (_loc, "<:json< 1 >>")
After lifting:
ExAnt (_loc, "<:json< 1 >>")
After expanding (during which we parse and expand "<:json< 1 >>"
to <:expr< Jq_number 1. >>
):
<:expr< Jq_number 1. >>
A wise man once said “The string is a stark data structure and everywhere it is passed there is much duplication of process.” So it is with Camlp4 quotations: each nested quotation is re-parsed; each quotation implementation must deal with parsing host-language antiquotation strings; and the lexer for each implementation must lex antiquotations and nested quotations. (Since we used the default lexer we didn’t have to worry about this, but see the next post.) It would be nice to have more support from Camlp4. On the other hand, while what happens at runtime seems baroque, the code above is relatively straightforward, and since we work with strings we can use any parser technology we like.
It has not been much (marginal) trouble to handle quotations in pattern contexts, but they are not tremendously useful. The problem is that we normally don’t care about the order of the fields in a JSON object, or if there are extra fields; we would like to write
match x with
| <:json< {
"foo" : $foo$
} >> -> ... (* do something with foo *)
and have it work wherever the foo
field is in the object. This is a more complicated job than just lifting the JSON AST. For an alternative approach to processing JSON using a list-comprehension syntax, see json_compr, an example I wrote for the upcoming metaprogramming tutorial at CUFP. For a fancier JSON DSL (including the ability to induct a type description from a bunch of examples!), see Julien Verlauget’s jsonpat. And for a framework to extend OCaml’s pattern-matching syntax, see Jeremy Yallop’s ocaml-patterns.
Next time we will see how to use a custom lexer with a Camlp4 grammar.
(You can find the complete code for this example here.)
No comments:
Post a Comment