Sunday, January 4, 2009

Reading Camlp4, part 2: quotations

Last time we saw how Camlp4 represents OCaml abstract syntax trees. We also saw some small examples of using Camlp4 quotations to get the AST corresponding to a piece of OCaml code. In this post we'll build a simple code-generating tool that uses quotations in more complicated ways.

The tool takes constructor names on stdin, one per line. It prints an OCaml module on stdout containing a type definition with the given constructors, a function to_string that converts a constructor of the type to the string of its name, and a function of_string that does the reverse. (Not the most useful program ever written.) We'll go through it line by line:

open Camlp4.PreCast

let _loc = Loc.ghost in
We start out with some boilerplate. Recall that Camlp4.PreCast gets us the Ast module containing the AST datatypes (along with some helper functions we'll see below); and that we need a location bound to _loc since the expanded quotations mention it.
let cons =
  let rec loop () =
    try
      match read_line () with
        | "" -> []
        | c -> c :: loop ()
    with End_of_file -> [] in
  loop () in
This part just reads lines from stdin until EOF and puts them in a list, nothing to do with Camlp4.
Printers.OCaml.print_implem
<:str_item<
Now we're into the meat. We've begun a str_item quotation (recall that this is a module body), and we're passing it to a function that will pretty-print it to stdout. The module Printers.OCaml comes from camlp4/Camlp4/Printers/OCaml.ml. (This module is called to pretty-print OCaml code when you run Camlp4; toward the bottom you can see some useful command-line options.)
type t =
    $Ast.TySum (_loc,
               Ast.tyOr_of_list
                 (List.map
                     (fun c -> <:ctyp< $uid:c$ >>)
                     cons))$
It's a bit confusing with the interjected explanations, but we are still in the quotation (up to the >> at the end of the program). This part generates the variant type definition. If the input is the lines "Foo", "Bar", "Baz", we want to generate type t = Foo | Bar | Baz.

This might be a good time to run the above type definition through Camlp4 to see what the AST looks like. You'll see that the body of the definition begins with Ast.TySum, then contains the branches of the variant collected by Ast.TyOr's. Each branch is an Ast.TyId with an Ast.IdUid inside containing the constructor name. We want to come up with the same thing for an arbitrary list of constructor names.

Reading from the inside out, we take the list of constructor names and map a ctyp quotation over it. In this quotation we see an antiquotation. Antiquotations let you fill in code so you can use a quotation as a template. Here the tag uid: turns the string c into an Ast.IdUid inside an Ast.TyId. So we have a list of Ast.TyId (Ast.IdUid "Foo")'s.

Now we call a helper function Ast.tyOr_of_list. There are a bunch of these in camlp4/Camlp4/Struct/Camlp4Ast.mlast, which is where the Ast module comes from. This one collects the elements of the list with Ast.TyOr. (As we saw with Ast.StSem/Ast.StNil, the AST doesn't use regular OCaml lists, but builds list-like things out of data constructors.)

Finally we wrap a Ast.TySum around the whole thing, and use an antiquotation to insert it as the body of the type definition. (Here no tag is needed on the antiquotation; this is usually the case if you're inserting an AST rather than a string.)

let to_string = function
    $Ast.mcOr_of_list
      (List.map
          (fun c -> <:match_case< $uid:c$ -> $`str:c$ >>)
          cons)$

let of_string = function
    $let ors =
       Ast.mcOr_of_list
         (List.map
             (fun c -> <:match_case< $`str:c$ -> $uid:c$ >>)
             cons) in
     Ast.McOr(_loc,
             ors,
             <:match_case< _ -> invalid_arg "bad string" >>)$

>>
Now that we've seen how things work, the functions are pretty easy to understand. There are a few new elements: A pattern match (in either a match or function expression) consists of an Ast.McArr for each arrow (which we make with the match_case quotation), collected by Ast.McOr's (which we make with the Ast.mcOr_of_list helper).

The `str antiquotation turns a string into a quoted OCaml string, escaping any special characters.

If you run a sample match through Camlp4 you'll see constructors like Ast.PaId and Ast.PaStr. These are for patterns on the left side of match cases. When we use the match_case quotation we don't have to worry about it; we can use the same antiquotations on the left and right, and the appropriate constructor is parsed out.

That's the whole program. Here's a Makefile for building it:

variant: variant.ml
        ocamlc \
          -pp camlp4of -I +camlp4 \
          -o variant camlp4lib.cma variant.ml
#       ocamlfind ocamlc -syntax camlp4o \
          -package camlp4.quotations.o -package camlp4.lib \
          -linkpkg -o variant variant.ml

clean:
        rm -f variant *.cm* *~
Quotations and antiquotations are a slick way to generate OCaml code, but it can be pretty hard to figure out how to use them. The first thing to realize is that you don't have to: anything you can do with Camlp4 you can do by working directly with the AST constructors (although you will find it tedious). As you learn your way around you can replace explicit AST code with quotations.

To access the full power of quotations we'll need to dive into the OCaml parser. But that's going to have to wait until the next post.

7 comments:

  1. Hi and thanks for these nice posts,

    I have two remarks: the first is about quotations in original syntax, there not necessary simpler than those in the revised syntax, some node will be
    inaccessible due to ambiguities.

    The second is a refactoring suggestions, you can improve the of_string generator by pulling out the
    catch-all case.

    Here is the answer http://bit.ly/sK3d

    ReplyDelete
  2. Thanks Nicolas. I will say something about both these issues in my next post.

    ReplyDelete
  3. Hi Jake,

    I needed to change the build rule to

    variant: variant.ml
    ocamlc \
    -pp camlp4of -I +camlp4 \
    -o variant dynlink.cma camlp4lib.cma variant.ml

    The dynlink module must be new.

    ReplyDelete
  4. Thanks, yes, that's since 3.11.x I think; dynlink.cma used to be packaged with camlp4lib.cma.

    ReplyDelete
  5. Hi Jake.

    Thanks for posting this article. I could not find any other tutorial that provides the detail that you have. I currently get the following error when running the example.

    File "variant.ml", line 16, characters 4-158:
    Error: This expression has type Camlp4.PreCast.Ast.ctyp
    but an expression was expected of type Camlp4.PreCast.Ast.str_item


    I am currently using 3.12.1. Was something new introduced to break your example?

    ReplyDelete
    Replies
    1. Hi Frank,

      I use OCaML 3.12.1 on Fedora 17 i686 and build with dynlink.cma as suggested by Shawnessy, and I get no error.

      Jake, could you attach the file variant.ml to make it easier for readers of this excellent introduction to camlp4 to build it?

      Delete
  6. Hi Jake -- Thank you for writing this series of articles. Far better than anything else out there!
    Tim

    ReplyDelete