Last time we saw how Camlp4 represents OCaml abstract syntax trees. We also saw some small examples of using Camlp4 quotations to get the AST corresponding to a piece of OCaml code. In this post we'll build a simple code-generating tool that uses quotations in more complicated ways.
The tool takes constructor names on stdin
, one per line. It prints an OCaml module on stdout
containing a type definition with the given constructors, a function to_string
that converts a constructor of the type to the string of its name, and a function of_string
that does the reverse. (Not the most useful program ever written.) We'll go through it line by line:
open Camlp4.PreCast let _loc = Loc.ghost inWe start out with some boilerplate. Recall that
Camlp4.PreCast
gets us the Ast
module containing the AST datatypes (along with some helper functions we'll see below); and that we need a location bound to _loc
since the expanded quotations mention it. let cons = let rec loop () = try match read_line () with | "" -> [] | c -> c :: loop () with End_of_file -> [] in loop () inThis part just reads lines from
stdin
until EOF and puts them in a list, nothing to do with Camlp4. Printers.OCaml.print_implem <:str_item<Now we're into the meat. We've begun a
str_item
quotation (recall that this is a module body), and we're passing it to a function that will pretty-print it to stdout
. The module Printers.OCaml
comes from camlp4/Camlp4/Printers/OCaml.ml
. (This module is called to pretty-print OCaml code when you run Camlp4; toward the bottom you can see some useful command-line options.) type t = $Ast.TySum (_loc, Ast.tyOr_of_list (List.map (fun c -> <:ctyp< $uid:c$ >>) cons))$It's a bit confusing with the interjected explanations, but we are still in the quotation (up to the
>>
at the end of the program). This part generates the variant type definition. If the input is the lines "Foo", "Bar", "Baz", we want to generate type t = Foo | Bar | Baz
.This might be a good time to run the above type definition through Camlp4 to see what the AST looks like. You'll see that the body of the definition begins with Ast.TySum
, then contains the branches of the variant collected by Ast.TyOr
's. Each branch is an Ast.TyId
with an Ast.IdUid
inside containing the constructor name. We want to come up with the same thing for an arbitrary list of constructor names.
Reading from the inside out, we take the list of constructor names and map a ctyp
quotation over it. In this quotation we see an antiquotation. Antiquotations let you fill in code so you can use a quotation as a template. Here the tag uid:
turns the string c
into an Ast.IdUid
inside an Ast.TyId
. So we have a list of Ast.TyId (Ast.IdUid "Foo")
's.
Now we call a helper function Ast.tyOr_of_list
. There are a bunch of these in camlp4/Camlp4/Struct/Camlp4Ast.mlast
, which is where the Ast
module comes from. This one collects the elements of the list with Ast.TyOr
. (As we saw with Ast.StSem
/Ast.StNil
, the AST doesn't use regular OCaml lists, but builds list-like things out of data constructors.)
Finally we wrap a Ast.TySum
around the whole thing, and use an antiquotation to insert it as the body of the type definition. (Here no tag is needed on the antiquotation; this is usually the case if you're inserting an AST rather than a string.)
let to_string = function $Ast.mcOr_of_list (List.map (fun c -> <:match_case< $uid:c$ -> $`str:c$ >>) cons)$ let of_string = function $let ors = Ast.mcOr_of_list (List.map (fun c -> <:match_case< $`str:c$ -> $uid:c$ >>) cons) in Ast.McOr(_loc, ors, <:match_case< _ -> invalid_arg "bad string" >>)$ >>Now that we've seen how things work, the functions are pretty easy to understand. There are a few new elements: A pattern match (in either a
match
or function
expression) consists of an Ast.McArr
for each arrow (which we make with the match_case
quotation), collected by Ast.McOr
's (which we make with the Ast.mcOr_of_list
helper).The `str
antiquotation turns a string into a quoted OCaml string, escaping any special characters.
If you run a sample match through Camlp4 you'll see constructors like Ast.PaId
and Ast.PaStr
. These are for patterns on the left side of match cases. When we use the match_case
quotation we don't have to worry about it; we can use the same antiquotations on the left and right, and the appropriate constructor is parsed out.
That's the whole program. Here's a Makefile
for building it:
variant: variant.ml ocamlc \ -pp camlp4of -I +camlp4 \ -o variant camlp4lib.cma variant.ml # ocamlfind ocamlc -syntax camlp4o \ -package camlp4.quotations.o -package camlp4.lib \ -linkpkg -o variant variant.ml clean: rm -f variant *.cm* *~Quotations and antiquotations are a slick way to generate OCaml code, but it can be pretty hard to figure out how to use them. The first thing to realize is that you don't have to: anything you can do with Camlp4 you can do by working directly with the AST constructors (although you will find it tedious). As you learn your way around you can replace explicit AST code with quotations.
To access the full power of quotations we'll need to dive into the OCaml parser. But that's going to have to wait until the next post.
Hi and thanks for these nice posts,
ReplyDeleteI have two remarks: the first is about quotations in original syntax, there not necessary simpler than those in the revised syntax, some node will be
inaccessible due to ambiguities.
The second is a refactoring suggestions, you can improve the of_string generator by pulling out the
catch-all case.
Here is the answer http://bit.ly/sK3d
Thanks Nicolas. I will say something about both these issues in my next post.
ReplyDeleteHi Jake,
ReplyDeleteI needed to change the build rule to
variant: variant.ml
ocamlc \
-pp camlp4of -I +camlp4 \
-o variant dynlink.cma camlp4lib.cma variant.ml
The dynlink module must be new.
Thanks, yes, that's since 3.11.x I think; dynlink.cma used to be packaged with camlp4lib.cma.
ReplyDeleteHi Jake.
ReplyDeleteThanks for posting this article. I could not find any other tutorial that provides the detail that you have. I currently get the following error when running the example.
File "variant.ml", line 16, characters 4-158:
Error: This expression has type Camlp4.PreCast.Ast.ctyp
but an expression was expected of type Camlp4.PreCast.Ast.str_item
I am currently using 3.12.1. Was something new introduced to break your example?
Hi Frank,
DeleteI use OCaML 3.12.1 on Fedora 17 i686 and build with dynlink.cma as suggested by Shawnessy, and I get no error.
Jake, could you attach the file variant.ml to make it easier for readers of this excellent introduction to camlp4 to build it?
Hi Jake -- Thank you for writing this series of articles. Far better than anything else out there!
ReplyDeleteTim