Tuesday, August 3, 2010

Reading Camlp4, part 8: implementing quotations

The Camlp4 system of quotations and antiquotations is an awesome tool for producing and consuming OCaml ASTs. In this post (and the following one) we will see how to provide this facility for other syntaxes and ASTs. Here we consider just quotations; we’ll add antiquotations in the following post.

An AST for JSON

Our running example will be a quotation expander for JSON. Let’s begin with the JSON AST, in a module Jq_ast:

  type t = 
    | Jq_null 
    | Jq_bool   of bool 
    | Jq_number of float 
    | Jq_string of string 
    | Jq_array  of t list 
    | Jq_object of (string * t) list 

This is the same (modulo order and names) as json_type from the json-wheel library, but for various reasons we will not be able to use json_type. The Jq_ prefix is for json_quot, the name of this little library.

Parsing JSON

We’ll use a Camlp4 grammar to parse JSON trees. It is not necessary to use Camlp4’s parsing facilities in order to implement quotations—ultimately we will need to provide just a function from strings to ASTs, so we could use ocamlyacc or what-have-you instead—but it is convenient. Here is the parser:

  open Camlp4.PreCast 
  open Jq_ast 
  
  module Gram = MakeGram(Lexer) 
  let json = Gram.Entry.mk "json" 
  
  ;; 
  
  EXTEND Gram 
    json: [[ 
        "null" -> Jq_null 
      | "true" -> Jq_bool true 
      | "false" -> Jq_bool false 
      | i = INT -> Jq_number (float_of_string i) 
      | f = FLOAT -> Jq_number (float_of_string f) 
      | s = STRING -> Jq_string s 
      | "["; es = LIST0 json SEP ","; "]" -> Jq_array es 
      | "{"; 
          kvs = 
            LIST0 
              [ s = STRING; ":"; j = json -> (s, j) ] 
              SEP ","; 
        "}" -> Jq_object kvs 
    ]]; 
  END 

We use the default Camlp4 lexer (with MakeGram(Lexer)); as we have seen, keywords mentioned in a Camlp4 grammar are added to the lexer, so we don’t need to do anything special to lex null etc. However, while JSON/Javascript has a single number type, the default lexer returns different tokens for INT and FLOAT numbers, so we convert each to Jq_number. In fact, these tokens (along with STRING) represent OCaml integer, float and string literals, which do not exactly match the corresponding JSON ones, but they are fairly close so let’s not worry about it for now; we’ll revisit the lexer in a later post.

The parser itself is pleasingly compact; we can make good use of the LIST0 special symbol and an anonymous entry for parsing objects. Unfortunately things will get a little more complicated when we come to antiquotations.

Lifting the AST

Next we need to “lift” values of the JSON AST to values of the OCaml AST. What does “lift” mean, and why do we need to do it? The goal is to convert quotations in OCaml code, such as

  let x = <:json< [ 1, "foo", true ] >> 

into the equivalent

  let x = 
    Jq_ast.Jq_array [ 
      Jq_ast.Jq_number 1.; 
      Jq_ast.Jq_string "foo"; 
      Jq_ast.Jq_bool true 
    ] 

This is to happen as part of Camlp4 preprocessing, which produces an OCaml AST, so what we produce in place of the <:json< ... >> expression must be a fragment of OCaml AST. We have a parser which takes a valid JSON string to the JSON AST; what remains is to take a JSON AST value to the corresponding OCaml AST. So we need a function with cases something like:

  | Jq_null -> <:expr< Jq_null >> 
  | Jq_number n -> <:expr< Jq_number $`flo:n$ >> 
  | ... 

It is not such a big deal to hand-write this lifting function for a small AST like JSON, but it is arduous and error-prone for full-size ASTs. Fortunately Camlp4 has a filter which does it for us. Let’s first look at the signature of the Jq_ast module:

  open Camlp4.PreCast 
  
  type t = ... (* as above *) 
  
  module MetaExpr : 
  sig 
    val meta_t : Ast.loc -> t -> Ast.expr 
  end 
  
  module MetaPatt : 
  sig 
    val meta_t : Ast.loc -> t -> Ast.patt 
  end 

The generated modules MetaExpr and MetaPatt provide functions to lift a JSON AST to either an OCaml expr (when the quotation appears as an expression) or patt (when it appears as a pattern). The loc arguments are inserted into the resulting OCaml AST so that compile errors have correct locations.

Now the implementation of Jq_ast:

  module Jq_ast = 
  struct 
    type float' = float 
  
    type t = (* almost as above *) 
        ... 
      | Jq_number of float' 
        ... 
  end 
  
  include Jq_ast 
  
  open Camlp4.PreCast (* for Ast refs in generated code *) 
  
  module MetaExpr = 
  struct 
    let meta_float' _loc f = <:expr< $`flo:f$ >> 
    include Camlp4Filters.MetaGeneratorExpr(Jq_ast) 
  end 
  
  module MetaPatt = 
  struct 
    let meta_float' _loc f = <:patt< $`flo:f$ >> 
    include Camlp4Filters.MetaGeneratorPatt(Jq_ast) 
  end 

The file needs the Camlp4MetaGenerator filter (the camlp4.metagenerator package with findlib). The main idea is that the calls to Camlp4Filters.MetaGenerator{Expr,Patt} are expanded into the lifting functions. But there are a couple of fussy details:

First: The argument module Jq_ast which we pass to the generators is used both on the left and right of the generated function; if you look at the generated code there are cases like:

  | Jq_ast.Jq_null -> <:expr< Jq_ast.Jq_null >> 

(The <:expr< .. >> is already expanded in the actual generated code.) We need the AST to be available qualified by the module Jq_ast both in the current file and also in code that uses the quotation. So we have a nested Jq_ast module (for local uses, on the left-hand side) which we include (for external uses, on the right-hand side).

Second: The generators scan all the types defined in the current module, then generate code from the last-appearing recursive bundle. (In this case the recursive bundle contains just t, but in general there can be more than one; mutually recursive lifting functions are generated.) There are some special cases for predefined types, and in particular for float; however, it seems to be wrong:

  let meta_float _loc s = Ast.ExFlo (_loc, s) 

The ExFlo constructor takes a string representing the float, but calls to this function are generated when you use float in your type. To work around this, we define the type float' (on its own rather than as part of the last-appearing recursive bundle, or else Camlp4 would generate a meta_float' that calls meta_float), and provide correct meta_float' functions. There is a similar bug with meta_int, but meta_bool is correct, so our Jq_bool case does not need fixing.

(It is interesting to contrast this approach of lifting the AST with how it is handled in Template Haskell using the “scrap your boilerplate” pattern; see Geoffrey Mainland’s paper Why It’s Nice to be Quoted.)

Quotations

Finally we can hook the parser and AST lifter into Camlp4’s quotation machinery, in the Jq_quotations module:

  open Camlp4.PreCast 
  
  module Q = Syntax.Quotation 
  
  let json_eoi = Jq_parser.Gram.Entry.mk "json_eoi" 
  
  EXTEND Jq_parser.Gram 
    json_eoi: [[ x = Jq_parser.json; EOI -> x ]]; 
  END;; 
  
  let parse_quot_string loc s = 
    Jq_parser.Gram.parse_string json_eoi loc s 
  
  let expand_expr loc _ s = 
    Jq_ast.MetaExpr.meta_t loc (parse_quot_string loc s) 
  
  let expand_str_item loc _ s = 
    let exp_ast = expand_expr loc None s in 
    <:str_item@loc< $exp:exp_ast$ >> 
  
  let expand_patt loc _ s = 
    Jq_ast.MetaPatt.meta_t loc (parse_quot_string loc s) 
  
  ;; 
  
  Q.add "json" Q.DynAst.expr_tag expand_expr; 
  Q.add "json" Q.DynAst.patt_tag expand_patt; 
  Q.add "json" Q.DynAst.str_item_tag expand_str_item; 
  Q.default := "json" 

First, we make a new grammar entry json_eoi which parses a json expression followed by the end-of-input token EOI. Grammar entries ordinarily ignore the rest of the input after a successful parse. If we were to use the json entry directly, we would silently accept quotations with trailing garbage, and in particular incorrect quotations that happen to have a correct prefix, rather than alerting the user.

Then we register quotation expanders for the <:json< >> quotation in the expr, patt, and str_item contexts (str_item is useful because that is the context at the top level prompt), using Syntax.Quotation.add. All the expanders do is call the parser, then run the result through the appropriate lifting function.

Finally we set json as the default quotation, so we can just say << >> for JSON quotations. This is perhaps a bit cheeky, since the user may want something else as the default quotation; whichever module is loaded last wins.

It is worth reflecting on how the quotation mechanism works in the OCaml parser: There is a lexer token for quotations, but no node in the OCaml AST, so everything must happen in the parser. When a quotation is lexed, its entire contents is returned as a string. (Nested quotations are matched in the lexer—see quotation and antiquot in camlp4/Camlpl4/Struct/Lexer.mll—without considering the embedded syntax; this makes the << and >> tokens unusable in the embedded syntax.) The string is then expanded according to the table of registered expanders; expanders return a fragment of OCaml AST which is inserted into the parse tree.

You might have thought (as I did) that something fancy happens with quotations, e.g. Camlp4 switches to a different parser on the fly, then back to the original parser for antiquotations. But it is much simpler than that. At the same time, it is much more complicated than that, as we will see next time when we cover antiquotations (and in particular how nested antiquotations/quotations are handled).

(You can find the complete code here, including a pretty-printer and integration with the top level; after building and installing you can say e.g.

  # << [ 1, "foo", true ] >>;; 
  - : Jq_ast.t = [ 1, "foo", true ] 

although without antiquotations it is not very useful.)

No comments:

Post a Comment