Tuesday, January 27, 2009

Reading Camlp4, part 4: consuming OCaml ASTs

It's easy to think of Camlp4 as just "defmacro on steroids"; that is, just a tool for syntax extension, but it is really a box of independently-useful tools. As we've seen, Camlp4 can be used purely for code generation; in this post I'll describe a tool that uses it purely for code consumption: a (minimal, broken) version of otags:
open Camlp4.PreCast
module M = Camlp4OCamlRevisedParser.Make(Syntax)
module N = Camlp4OCamlParser.Make(Syntax)
We're going to call the OCaml parser directly. These functor applications are used only for their effect (which is to fill in an empty grammer with OCaml cases); ordinarily they would be called as part of Camlp4's dynamic loading process. Recall that the original syntax parser is an extension of the revised parser, so we need both, in this order.
let files = ref []

let rec do_fn fn =
  let st = Stream.of_channel (open_in fn) in
  let str_item = Syntax.parse_implem (Loc.mk fn) st in
  let str_items = Ast.list_of_str_item str_item [] in
  let tags = List.fold_right do_str_item str_items [] in
  files := (fn, tags)::!files
We'll call do_fn for each filename on the command line. The Syntax.parse_implem function takes a Loc.t and a stream, and parses the stream into a str_item. (The initial Loc.t just provides the filename so later locations can refer to it, for error messages etc.) Now, recall that even though we got back a single str_item, it can contain several definitions (collected with StSem). We use Ast.list_of_str_item to get an ordinary list, then accumulate tags into files.
and do_str_item si tags =
  match si with
 (* | <:str_item< let $rec:_$ $bindings$ >> -> *)
    | Ast.StVal (_, _, bindings) ->
        let bindings = Ast.list_of_binding bindings [] in
        List.fold_right do_binding bindings tags
    | _ -> tags
We'll only consider value bindings. The commented-out str_item quotation doesn't work (run it through Camlp4 to see why--I'm not sure where the extra StSem/StNil come from), so we fall back to an explicit constructor. (The rec antiquotation matches a flag controlling whether an StVal is a let rec or just a let; here we don't care.) Now we have an Ast.binding, which again can contain several bindings (collected with BiAnd) so we call Ast.list_of_bindings.
and do_binding bi tags =
  match bi with
    | <:binding@loc< $lid:lid$ = $_$ >> ->
      let line = Loc.start_line loc in
      let off = Loc.start_off loc in
      let pre = "let " ^ lid in
      (pre, lid, line, off)::tags
    | _ -> tags
We're going to generate an etags-format file, where each definition consists of a prefix of the line in the source, the tag itself, the line number, and the character offset. If you look in the parser you'll see that the left side of a binding can be any pattern (as you'd expect), but we only handle the case where it's a single identifier; the lid antiquotation extracts it as a string. The line number and character offset are easy to find from the location of the binding (see camlp4/Camlp4/Sig.ml for the Loc functions), which we get with @loc. The prefix is problematic: the location of the binding does not include the let or and part, and anyway what we really want is everything from the beginning of the line. Doable but not so instructive of Camlp4, so we just tack on a "let " prefix (so this doesn't work for and or if there is whitespace).
let print_tags files =
  let ch = open_out "TAGS" in
  ListLabels.iter files ~f:(fun (fn, tags) ->
    Printf.fprintf ch "\012\n%s,%d\n" fn 0;
    ListLabels.iter tags ~f:(fun (pre, tag, line, off) ->
      Printf.fprintf ch "%s\127%s\001%d,%d\n" pre tag line off))
Generating the tags file is straightforward, following the description at the bottom of the otags README. (The 0 is supposed to be the length of the tag data, but my Emacs doesn't seem to care.) We put the pieces together with Arg:
Arg.parse [] do_fn "otags: fn1 [fn2 ...]";
print_tags !files
and finally, a Makefile:
otags: otags.ml
        ocamlc \
          -pp camlp4of \
          -o otags \
          -I +camlp4 -I +camlp4/Camlp4Parsers \
          dynlink.cma camlp4fulllib.cma otags.ml
We could improve this in many ways (error-handling, patterns, types, etc.); clearly we can't replicate otags in a few dozen lines. But Camlp4 takes care of a lot of the hard work. Next time, maybe, an actual syntax extension.

1 comment:

  1. Great series. I'm just getting around to reading it - very helpful for getting started with camlpr4.

    I compiled your example code and tried running otags on otags.ml and I get:
    Fatal error: exception Loc.Exc_located(_, _)

    I'm running OCaml 3.11.2