Thursday, August 5, 2010

Reading Camlp4, part 9: implementing antiquotations

In this post I want to complicate the JSON quotation library from the previous post by adding antiquotations.

AST with antiquotations

In order to support antiquotations we will need to make some changes to the AST. Here is the new AST type:

  type t = 
      ... (* base types same as before *) 
    | Jq_array  of t 
    | Jq_object of t 
  
    | Jq_colon  of t * t 
    | Jq_comma  of t * t 
    | Jq_nil 
  
    | Jq_Ant    of Loc.t * string 

Let’s first consider Jq_Ant. Antiquotations $tag:body$ are returned from the lexer as an ANTIQUOT token containing the (possibly empty) tag and the entire body (including nested quotations/antiquotations) as a string. In the parser, we deal only with the JSON AST, so we can’t really do anything with an antiquotation but return it to the caller (wrapped in a Jq_Ant).

The lifting functions generated by Camlp4MetaGenerator treat Jq_Ant (and any other constructor ending in Ant) specially: instead of

  | Jq_Ant (loc, s) -> 
      <:expr< Jq_Ant ($meta_loc loc$, $meta_string s$) >> 

they have

  | Jq_Ant (loc, s) -> ExAnt (loc, s) 

Instead of lifting the constructor, they translate it directly to ExAnt (or PaAnt, depending on the context). We don’t otherwise have locations in our AST, but Jq_Ant must take a Loc.t argument because ExAnt does. Later, when we walk the OCaml AST expanding antiquotations, it will be convenient to have them as ExAnt nodes rather than lifted Jq_Ant nodes.

In addition to Jq_Ant, we have new Jq_nil, Jq_comma, and Jq_colon constructors, and we have replaced the lists in Jq_array and Jq_object with just t. The idea here is that in an antiquotation in an array, e.g.

  <:json< [ 1, true, $x$, "foo" ] >> 

we would like to be able to substitute any number of elements (including zero) into the array in place of x. If Jq_array took a list, we could substitute exactly one element only. So instead we build a tree out of Jq_comma and Jq_nil constructors; at any point in the tree we can substitute zero (Jq_nil), one (any other t constructor), or more than one (a Jq_comma subtree) elements. We recover a list by taking the fringe of the final tree. (In the Jq_ast module there are functions t_of_list and list_of_t which convert between these representations.) For objects, we use Jq_colon to associate a name with a value, then build a tree of name/value pairs the same way.

While this AST meets the need, it is now possible to have ill-formed ASTs, e.g. a bare Jq_nil, or a Jq_object where the elements are not Jq_colon pairs, or where the first argument of Jq_colon is not a Jq_string. This is annoying, but it is hard to see how to avoid it without complicating the AST and making it more difficult to use antiquotations.

Parsing antiquotations

Here is the updated parser:

  EXTEND Gram 
    json: [[ 
        ... (* base types same as before *) 
  
      | `ANTIQUOT 
          (""|"bool"|"int"|"flo"|"str"|"list"|"alist" as n, s) -> 
            Jq_Ant (_loc, n ^ ":" ^ s) 
  
      | "["; es = SELF; "]" -> Jq_array es 
      | "{"; kvs = SELF; "}" -> Jq_object kvs 
  
      | e1 = SELF; ","; e2 = SELF -> Jq_comma (e1, e2) 
      | -> Jq_nil 
 
      | e1 = SELF; ":"; e2 = SELF -> Jq_colon (e1, e2)  
    ]]; 
  END 

We want to support several kinds of antiquotations: For individual elements, $x$ (where x is a t), or $bool:x$, $int:x$, $flo:x$, or $str:x$ (where x is an OCaml bool, int, float, or string); for these latter cases we need to wrap x in the appropriate t constructor. For lists of elements, $list:x$ where x is a t list, and $alist:x$ where x is a (string * t) list; for these we need to convert x to the Jq_comma / Jq_nil representation above. But in the parser all we do is return a Jq_Ant containing the tag and body of the ANTIQUOT token. (We return it in a single string separated by : because only one string argument is provided in ExAnt.)

It is the parser which controls where antiquotations are allowed, by providing a case for ANTIQUOT in a particular entry, and which tags are allowed in an entry. In this example we have only one entry, so we allow any supported antiquotation anywhere a JSON expression is allowed, but you can see in the OCaml parsers that the acceptable antiquotations can be context-sensitive, and the interpretation of the same antiquotation can vary according to the context (e.g. different conversions may be needed).

For arrays and objects, we parse SELF in place of the list. The cases for Jq_comma and Jq_nil produce the tree representation, and the case for Jq_colon allows name/value pairs. Recall that a token or keyword is preferred over the empty string, so the Jq_nil case matches only when none of the others do. In particular, the quotation <:json< >> parses to Jq_nil.

We can see that not only is the AST rather free, but so is the parser: it will parse strings which are not well-formed JSON, like <:json< 1, 2 >> or <json:< "foo" : true >>. We lose safety, since a mistake may produce an ill-formed AST, but gain convenience, since we may want to substitute these fragments in antiquotations. As an alternative, we could have a more restrictive parser (e.g. no commas allowed at the json entry), and provide different quotations for different contexts (e.g. <:json_list< >>, allowing commas) for use with antiquotations. For this small language I think it is not worth it.

Expanding antiquotations

To expand antiquotations, we take a pass over the OCaml AST we got from lifting the JSON AST; look for ExAst nodes; parse them as OCaml; then apply the appropriate conversion according to the antiquotation tag. To walk the AST we extend the Ast.map object (generated with the Camlp4FoldGenerator filter) so we don’t need a bunch of boilerplate cases which return the AST unchanged. Here’s the code:

  module AQ = Syntax.AntiquotSyntax 
  
  let destruct_aq s = 
    let pos = String.index s ':' in 
    let len = String.length s in 
    let name = String.sub s 0 pos 
    and code = String.sub s (pos + 1) (len - pos - 1) in 
    name, code 
  
  let aq_expander = 
  object 
    inherit Ast.map as super 
    method expr = 
      function 
        | Ast.ExAnt (_loc, s) -> 
            let n, c = destruct_aq s in 
            let e = AQ.parse_expr _loc c in 
            begin match n with 
              | "bool" -> <:expr< Jq_ast.Jq_bool $e$ >> 
              | "int" -> 
                  <:expr< Jq_ast.Jq_number (float_of_int $e$) >> 
              | "flo" -> <:expr< Jq_ast.Jq_number $e$ >> 
              | "str" -> <:expr< Jq_ast.Jq_string $e$ >> 
              | "list" -> <:expr< Jq_ast.t_of_list $e$ >> 
              | "alist" -> 
                  <:expr< 
                    Jq_ast.t_of_list 
                      (List.map 
                        (fun (k, v) -> 
                          Jq_ast.Jq_colon (Jq_ast.Jq_string k, v)) 
                        $e$) 
                  >> 
              | _ -> e 
            end 
        | e -> super#expr e 
    method patt = 
      function 
        | Ast.PaAnt (_loc, s) -> 
            let _, c = destruct_aq s in 
            AQ.parse_patt _loc c 
        | p -> super#patt p 
  end 

When we find an antiquotation, we unpack the tag and contents (with destruct_aq), parse it using the host syntax (given by Syntax.AntiquotSyntax from Camlp4.PreCast, which might be either the original or revised syntax depending which modules are loaded), then insert conversions depending on the tag. Conversions don’t make sense in a pattern context, so for patterns we just return the parsed antiquotation.

Finally we hook into the quotation machinery, mostly as before:

let parse_quot_string loc s = 
  let q = !Camlp4_config.antiquotations in 
  Camlp4_config.antiquotations := true; 
  let res = Jq_parser.Gram.parse_string json_eoi loc s in 
  Camlp4_config.antiquotations := q; 
  res 
 
let expand_expr loc _ s = 
  let ast = parse_quot_string loc s in 
  let meta_ast = Jq_ast.MetaExpr.meta_t loc ast in 
  aq_expander#expr meta_ast 
 
;; 
 
Q.add "json" Q.DynAst.expr_tag expand_expr; 

Before parsing a quotation we set a flag, which is checked by the lexer, to allow antiquotations; the flag is initially false, so antiquotations appearing outside a quotation won’t be parsed. After lifting the JSON AST to an OCaml AST, we run the result through the antiquotation expander.

For concreteness, let’s follow the life of a quotation as it is parsed and expanded. Say we begin with

  <:json< [ 1, $int:x$ ] >> 

After parsing:

  Jq_array (Jq_comma (Jq_number 1., Jq_Ant (_loc, "int:x"))) 

After lifting:

  <:expr< 
    Jq_array (Jq_comma (Jq_number 1., $ExAnt (_loc, "int:x")$)) 
  >> 

After expanding:

  <:expr< 
    Jq_array (Jq_comma (Jq_number 1., Jq_number (float_of_int x))) 
  >> 
Nested quotations

Let’s see that again with a nested quotation:

  <:json< $<:json< 1 >>$ >> 

After parsing:

  Jq_Ant (_loc, "<:json< 1 >>") 

After lifting:

  ExAnt (_loc, "<:json< 1 >>") 

After expanding (during which we parse and expand "<:json< 1 >>" to <:expr< Jq_number 1. >>):

  <:expr< Jq_number 1. >> 

A wise man once said “The string is a stark data structure and everywhere it is passed there is much duplication of process.” So it is with Camlp4 quotations: each nested quotation is re-parsed; each quotation implementation must deal with parsing host-language antiquotation strings; and the lexer for each implementation must lex antiquotations and nested quotations. (Since we used the default lexer we didn’t have to worry about this, but see the next post.) It would be nice to have more support from Camlp4. On the other hand, while what happens at runtime seems baroque, the code above is relatively straightforward, and since we work with strings we can use any parser technology we like.

It has not been much (marginal) trouble to handle quotations in pattern contexts, but they are not tremendously useful. The problem is that we normally don’t care about the order of the fields in a JSON object, or if there are extra fields; we would like to write

  match x with 
    | <:json< { 
        "foo" : $foo$ 
      } >> -> ... (* do something with foo *) 

and have it work wherever the foo field is in the object. This is a more complicated job than just lifting the JSON AST. For an alternative approach to processing JSON using a list-comprehension syntax, see json_compr, an example I wrote for the upcoming metaprogramming tutorial at CUFP. For a fancier JSON DSL (including the ability to induct a type description from a bunch of examples!), see Julien Verlauget’s jsonpat. And for a framework to extend OCaml’s pattern-matching syntax, see Jeremy Yallop’s ocaml-patterns.

Next time we will see how to use a custom lexer with a Camlp4 grammar.

(You can find the complete code for this example here.)

No comments:

Post a Comment