Hey, long time no see!
It is high time to get back to Camlp4, so I would like to pick up the thread by covering Camlp4 filters. We have previously considered the parsing and pretty-printing facilities of Camlp4 separately. But of course the most common way to use Camlp4 is as a front-end to ocamlc
, where it processes files by parsing them into an AST and pretty-printing them back to text (well, not quite—we will see below how the AST is passed to ocamlc
). In between we can insert filters to transform the AST.
So let’s dive into an example: a filter for type definitions that generates t_to_string
and t_of_string
functions for a type t
, a little like Haskell’s deriving Show, Read
. To keep it simple we handle only variant types, and only those where all the arms have no data. Here goes:
module Make (AstFilters : Camlp4.Sig.AstFilters) =
struct
open AstFilters
In order to hook into Camlp4’s plugin mechanism we define the filter as a functor. By opening AstFilters
we get an Ast
module in scope. Unfortunately this is not the same Ast
we got previously from Camlp4.PreCast
(although it has the same signature) so all our code that uses Ast
(including all OCaml syntax quotations) needs to go inside the functor body.
let rec filter si =
match wrap_str_item si with
| <:str_item< type $lid:tid$ = $Ast.TySum (_, ors)$ >> ->
begin
try
let cons =
List.map
(function
| <:ctyp< $uid: c$ >> -> c
| _ -> raise Exit)
(Ast.list_of_ctyp ors []) in
to_of_string si tid cons
with Exit -> si
end
| _ -> si
The filter
function filters Ast.str_item
s. (It is not actually recursive but we say let rec
so we can define helper functions afterward). If a str_item
has the right form we transform it by calling to_of_string
, otherwise we return it unchanged. We match a sum type definition, then extract the constructor names (provided that they have no data) into a string list. (Recall that a TySum
contains arms separated by TyOr
; the call to list_of_ctyp
converts that to a list of arms.)
and wrap_str_item si =
let _loc = Ast.loc_of_str_item si in
<:str_item< $si$ >>
For some reason, <:str_item< $si$ >>
wraps an extra StSem
/ StNil
around si
, so in order to use the quotation syntax on the left-hand side of a pattern match we need to do the same wrapping.
and to_of_string si tid cons =
let _loc = Ast.loc_of_str_item si in
<:str_item<
$si$;;
$to_string _loc tid cons$;;
$of_string _loc tid cons$;;
>>
This str_item
replaces the original one in the output, so we include the original one in additional to new ones containing the t_to_string
and t_of_string
functions.
and to_string _loc tid cons =
<:str_item<
let $lid: tid ^ "_to_string"$ = function
$list:
List.map
(fun c -> <:match_case< $uid: c$ -> $`str: c$ >>)
cons$
>>
To convert a variant to a string, we match over its constructors and return the corresponding string.
and of_string _loc tid cons =
<:str_item<
let $lid: tid ^ "_of_string"$ = function
$list:
List.map
(fun c -> <:match_case<
$tup: <:patt< $`str: c$ >>$ -> $uid: c$
>>)
cons$
| _ -> invalid_arg "bad string"
>>
To convert a string to a variant, we match over the corresponding string for each constructor and return the constructor; we also need a catchall for strings that match no constructor. (What is this tup
and patt
business? A contrived bug which we will fix below.)
;;
AstFilters.register_str_item_filter begin fun si ->
let _loc = Ast.loc_of_str_item si in
<:str_item<
$list: List.map filter (Ast.list_of_str_item si [])$
>>
end
Now we register our filter function with Camlp4. The input str_item
may contain many str_items
s separated by StSem
, so we call list_of_str_item
to get a list of individuals.
end
module Id =
struct
let name = "to_of_string"
let version = "0.1"
end
;;
let module M = Camlp4.Register.AstFilter(Id)(Make) in ()
Finally we register the plugin with Camlp4. The functor application is just for its side effect, so the plugin is registered when its .cmo
is loaded. We can compile the plugin with
ocamlfind ocamlc -package camlp4.quotations.o -syntax camlp4o \
-c to_of_string.ml
and run it on a file (containing type t = Foo | Bar | Baz
or something) with
camlp4o to_of_string.cmo test.ml
Looks pretty good, right? But something goes wrong when we try to use our plugin as a frontend for ocamlc
:
ocamlc -pp 'camlp4o ./to_of_string.cmo' test.ml
We get a preprocessor error, “singleton tuple pattern”. It turns out that Camlp4 passes the processed AST to ocamlc
not by pretty-printing it to text, but by converting it to the AST type that ocamlc
uses and marshalling it. This saves the time of reparsing it, and also passes along correct file locations (compare to cpp
’s #line
directives). However, as we have seen, the Camlp4 AST is pretty loose. When converting to an ocamlc
AST, Camlp4 does some validity checks on the tree. What can be confusing is that an AST that fails these checks may look fine when pretty-printed.
Here the culprit is the line
$tup: <:patt< $`str: c$ >>$ -> $uid: c$
which produces an invalid pattern consisting of a one-item tuple. When pretty-printed, though, the tup
just turns into an extra set of parentheses, which ocamlc
doesn’t mind. What we wanted was
$`str: c$ -> $uid: c$
This is a contrived example, but this kind of error is easy to make, and can be hard to debug, because looking at the pretty-printed output doesn’t tell you what’s wrong. One tactic is to run your code in the toplevel, which will print the constructors of the AST as usual. Another is to use a filter that comes with Camlp4 to “lift” the AST—that is, to generate the AST representing the original AST! Maybe it is easier to try it than to explain it:
camlp4o to_of_string.cmo -filter Camlp4AstLifter test.ml
Now compare the result to the tree you get back from Camlp4’s parser for the code you meant to write, and you can probably spot your mistake.
(If you tried to redirect the camlp4o
command to a file or pipe it through less
you got some line noise—this is the marshalled ocamlc
AST. By default Camlp4 checks whether its output is a TTY; if so it calls the pretty-printer, if not the ocamlc
AST marshaller. To override this use the -printer o
option, or -printer r
for revised syntax.)
This Camlp4AstLifter
is pretty useful. What else comes with Camlp4? There are several other filters in camlp4/Camlp4Filters
which you can call with -filter
:
-
Camlp4FoldGenerator
generates visitor classes from datatypes. Try puttingclass x = Camlp4MapGenerator.generated
after a type definition. The idea is that you can override methods of the visitor so you can do some transformation on a tree without having to write the boilerplate to walk the parts you don’t care about. In fact, this filter is used as part of the Camlp4 bootstrap to generate vistors for the AST; you can see themap
andfold
classes incamlp4/Camlp4/Sig.ml
. -
Camlp4MetaGenerator
generates lifting functions from a type definition—these functions are whatCamlp4AstLifter
uses to lift the AST, and it’s also how quotations are implemented. I’m planning to cover how to implement quotations / antiquotations (for a different language) in a future post, andCamlp4MetaGenerator
will be crucial. -
Camlp4LocationStripper
replaces all the locations in an AST withLoc.ghost
. I don’t know what this is for, but it might be useful if you wanted to compare two ASTs and be insensitive to their locations. -
Camlp4Profiler
inserts profiling code, in the form of function call counts. I haven’t tried it, and I’m not sure when you would want it in preference to gprof. -
Camlp4TrashRemover
just filters out a module calledCamlp4Trash
. Such a module may be found incamlp4/Camlp4/Struct/Camlp4Ast.mlast
; I think the idea is that the module is there in order to generate some stuff, but the module itself is not needed. -
Camlp4MapGenerator
has been subsumed byCamlp4FoldGenerator
. -
Camlp4ExceptionTracer
seems to be a special-purpose tool to help debug Camlp4.
OK, maybe not too much useful stuff here, but it is interesting to work out how Camlp4 is bootstrapped.
I think next time I will get into Camlp4’s extensible parsers, on the way toward syntax extensions.
ColophonI wrote my previous posts in raw HTML, with highlighted code generated from a hightlighted Emacs buffer by htmlize.el. Iterating on this setup was unutterably painful. This post was written using jekyll with a simple template to approximate the Blogspot formatting, mostly so I can check that lines of code aren’t too long. Jekyll is very nice: you can write text with Markdown, and highlight code with Pygments.
Very nice series of posts. I hope to read some other ;-)
ReplyDeleteHi Jake,
ReplyDeleteI generally don't like to use automated things like ocamlfind I don't understand. I can't figure out the regular ocamlc call to make though. I tried ocamlfind ocamlc -verbose and it gave me a horribly long command with a bunch of paths, etc. Can you write out the command I need to compile to_of_string.ml?
Thanks!
Sean
Yikes!
ReplyDeleteocamlc.opt -c -I /loc/of/camlp4 -pp "camlp4 -parser op -printer p -parser Camlp4QuotationCommon -parser Camlp4OCamlOriginalQuotationExpander" pt5.ml
There are 3 parsers required to compile this. Is there a good reason it's this complex?
Thanks!
Sean
You beat me to it :). I just checked in a revised example without ocamlfind at
ReplyDeletehttp://github.com/jaked/ambassadortothecomputers.blogspot.com/tree/master/_code/camlp4-filters/
I changed to revised syntax quotations since the code above doesn't quite work in 3.12.x.
The -I to ocamlc.opt is for the Camlp4 module reference in the code, but you can say -I +camlp4 instead. You can omit the various parser flags and call camlp4oof/camlp4orf instead of camlp4 (which have the appropriate modules loaded for original and revised syntax quotations, respectively). You can discover what a flag adds to the loaded modules with -loaded-modules, although for some reason camlp4oof/camlp4orf don't show the quotation modules.
As to why you need all these parsers, it's just because the parser is built up incrementally: the original syntax parser extends the revised syntax parser, then quotations extend the base parser with a common quotation module and one specific to the flavor of quotations (original or revised).