## Wednesday, November 24, 2010

### Three uses for a binary heap

Lately I have been interviewing for jobs, so doing a lot of whiteboard programming, and binary heaps keep arising in the solutions to these interview problems. There is nothing new or remarkable about these applications (binary heaps and their uses are covered in any undergraduate algorithms class), but I thought I would write them down because they are cute, and in the hope that they might be useful to someone else who (like me) gets by most days as a working programmer with no algorithm fancier than quicksort or binary search.

Binary heaps

Here’s a signature for a binary heap module `Heap`:

``````module type OrderedType =
sig
type t
val compare : t -> t -> int
end

module type S = sig
type elt
type t
val make : unit -> t
val add : t -> elt -> unit
val peek_min : t -> elt option
val take_min : t -> elt
val size : t -> int
end

module Make (O : OrderedType) : S with type elt = O.t
``````

We start with a signature for ordered types (following the `Set` and `Map` modules in the standard library), so we can provide a type-specific comparison function.

From an ordered type we can make a heap which supports adding elements, peeking the smallest element (`None` if there are no elements) without removing it, removing and returning the smallest element (raising `Not_found` if the heap is empty), and returning the number of elements.

We’ll work out the asymptotic running times of the algorithms below, so it will be useful to know that the worst-case running time of the `add` and `take_min` functions is `O(log n)` where `n` is the number of elements in the heap.

Finding the k smallest elements in a list

Here’s a simple one. To find the smallest element in a list, we could sort the list then take the first element in the sorted list, at a cost of `O(log n)`. Or we could just take a pass over the list keeping a running minimum, at a cost of `O(n)`.

What if we want the `k` smallest elements? Again, we could sort the list, but if `k < n` we can do better by generalizing the single-pass solution. The idea is to keep the `k` smallest elements we’ve seen so far in a binary heap. For each element in the list we add it to the heap, then (if there were already `k` elements in the heap) remove the largest element in the heap, leaving the `k` smallest.

The running time is `O(n log k)` since we do an `add` and a `take_min` in a heap of size `k` for each of `n` elements in the list. Here’s the code:

``````let kmin (type s) k l =
let module OT = struct
type t = s
let compare e1 e2 = compare e2 e1
end in
let module H = Heap.Make(OT) in

let h = H.make () in
List.iter
(fun e ->
if H.size h > k
then ignore (H.take_min h))
l;
let rec loop mins =
match H.peek_min h with
| None -> mins
| _ -> loop (H.take_min h :: mins) in
loop []
``````

Here we make good use of OCaml 3.12’s new feature for explicitly naming type variables in a polymorphic function to make a structure matching `OrderedType`. The heap has the same element type as the list, but we reverse the comparison since we want to remove the largest rather than smallest element from the heap in the loop. At the end of `kmin` we drain the heap to build a list of the `k` smallest elements.

Merging k lists

Suppose we want to merge `k` lists. We could merge them pairwise until there is only one list, but that would take `k - 1` passes, for a worst-case running time of `O(n * (k - 1))`. Instead we can merge them all in one pass, using a binary heap so we can find the next smallest element of `k` lists in `O(log k)` time, for a running time of ```O(n log k)```. Here’s the code:

``````let merge (type s) ls =
let module OT = struct
type t = s list
let compare e1 e2 =
compare (List.hd e1) (List.hd e2)
end in
let module H = Heap.Make(OT) in

let h = H.make () in
| [] -> ()
| l -> H.add h l in
let rec loop () =
match H.peek_min h with
| None -> []
| _ ->
match H.take_min h with
| [] -> assert false
| m :: t ->
m :: loop () in
loop ()
``````

We store the lists in the heap, and compare them by comparing their head element (we’re careful not to put an empty list in the heap). When we take the smallest list from the heap, its head becomes the next element in the output list, and we return its tail (if it is not empty) to the heap.

Computing a skyline

The next problem was told to me in terms of computing the skyline of a set of buildings. A building has a height and a starting and ending `x`-coordinate; buildings may overlap. The skyline of a set of buildings is a list of (`x`, `y`) pairs (in ascending `x` order), describing a sequence of horizontal line segments (each starting at (`x`, `y`) and ending at the subsequent `x`), such that at any `x` there is no space between the line segment and the tallest building. (Here’s another description with diagrams.)

I googled a bit to see what this is useful for, and didn’t find much. One application is to extract a monophonic line from polyphonic music, where `x` is time and height is some metric on notes, like pitch or volume. It might be useful for searching data which is only intermittently applicable—say, to compute a schedule over time of the nearest open restaurant.

The algorithm scans the building start and end points in ascending `x` order, keeping the “active” buildings (those which overlap the current `x`) in a binary heap. The height of the skyline can only change at a building start or end point. We can determine the tallest building at a point by calling `peek_min` on the heap.

When we hit a start point we add the building to the heap; for an end point we do nothing (the heap has no operation to remove an element). So we may have inactive buildings in the heap. We remove them lazily—before checking the height of the highest building, we call `take_min` to remove any higher inactive buildings.

The worst-case running time is `O(n log n)`, since we do some heap operations for each building, and we might end up with all the buildings in the heap.

Here’s the code:

``````type building = int * int * int (* x0, x1, h *)

let skyline bs =
let module OT = struct
type t = int * building
let compare (x1, _) (x2, _) = compare x1 x2
end in
let module Events = Heap.Make(OT) in
let events = Events.make () in
List.iter
(fun ((x0,x1,_) as b) ->
bs;

let module OT = struct
type t = building
let compare (_,_,h1) (_,_,h2) = compare h2 h1
end in
let module Heights = Heap.Make(OT) in
let heights = Heights.make () in

let rec loop last =
match Events.peek_min events with
| None -> []
| _ ->
let (x, (x0,_,h as b)) = Events.take_min events in
if x = x0 then Heights.add heights b;
while (match Heights.peek_min heights with
| Some (_,x1,_) -> x1 <= x
| _ -> false) do
ignore (Heights.take_min heights)
done;
let h =
match Heights.peek_min heights with
| Some (_,_,h) -> h
| None -> 0 in
match last with
| Some h' when h = h' -> loop last
| _ -> (x, h) :: loop (Some h) in
loop None
``````

We use a second heap `events` to store the “events” (the start and end points of all the buildings), in order to process them in ascending `x` order. (This use is not dynamic—we do not add new elements to the heap while processing them—so we could just as well use another means of sorting the points.) In this heap we store the `x` coordinate and the building (we can tell whether we have a start or end point by comparing the `x` coordinate to the building’s start point), and compare elements by comparing just the `x` coordinates.

The main heap `heights` stores buildings, and we compare them by comparing heights (reversed, so `peek_min` peeks the tallest building). While there are still events, we add the building to `heights` if the event is a start point, clear out inactive buildings, then return the pair (`x`, `y`) where `x` is the point we’re processing and `y` is the height of the tallest active building. Additionally we filter out adjacent pairs with the same height; these can arise when a shorter building starts or ends while a taller building is active.

Implementing binary heaps

The following implementation is derived from the one in Daniel Bünzli’s React library (edited a little bit for readability). The Wikipedia article on binary heaps explains the standard technique well, so I won’t repeat it.

The only piece of trickiness is the use of `Obj.magic 0` for unused elements of the array, so we can grow it by doubling the size rather than adding a single element each time, and thereby amortize the cost of blitting the old array.

``````module Make (O : OrderedType) : S with type elt = O.t =
struct
type elt = O.t
type t = { mutable arr : elt array; mutable len : int }

let make () = { arr = [||]; len = 0; }

let compare h i1 i2 = O.compare h.arr.(i1) h.arr.(i2)

let swap h i1 i2 =
let t = h.arr.(i1) in
h.arr.(i1) <- h.arr.(i2);
h.arr.(i2) <- t

let rec up h i =
if i = 0 then ()
else
let p = (i - 1) / 2 in
if compare h i p < 0 then begin
swap h i p;
up h p
end

let rec down h i =
let l = 2 * i + 1 in
let r = 2 * i + 2 in
if l >= h.len then ()
else
let child =
if r >= h.len then l
else if compare h l r < 0 then l else r in
if compare h i child > 0 then begin
swap h i child;
down h child
end

if h.len = Array.length h.arr
then begin
let len = 2 * h.len + 1 in
let arr' = Array.make len (Obj.magic 0) in
Array.blit h.arr 0 arr' 0 h.len;
h.arr <- arr'
end;
h.arr.(h.len) <- e;
up h h.len;
h.len <- h.len + 1

let peek_min h =
match h.len with
| 0 -> None
| _ -> Some h.arr.(0)

let take_min h =
match h.len with
| 0 -> raise Not_found
| 1 ->
let m = h.arr.(0) in
h.arr.(0) <- (Obj.magic 0);
h.len <- 0;
m
| k ->
let m = h.arr.(0) in
let k = k - 1 in
h.arr.(0) <- h.arr.(k);
h.arr.(k) <- (Obj.magic 0);
h.len <- k;
down h 0;
m

let size h = h.len
end
``````

(Complete code is here.)

## Friday, September 10, 2010

### Reading Camlp4, part 11: syntax extensions

In this final (?) post in my series on Camlp4, I want at last to cover syntax extensions. A nontrivial syntax extension involves almost all the topics we have previously covered, so it seems fitting that we treat them last.

Extending grammars

In the post on parsing we covered Camlp4 grammars but stopped short of explaining how to extend them. Well, this is not completely true: we used the `EXTEND` form to extend an empty grammar, and we can also use it to extend non-empty grammars. We saw a small example of this when implementing quotations, where we extended the JSON grammar with a new `json_eoi` entry (which refered to an entry in the original grammar). Rules and levels may also be added to existing entries, and rules may be deleted.

Let’s look at a complete syntax extension, which demonstrates modifying Camlp4’s OCaml grammar. The purpose of the extension is to change the precedence of the method call operator `#` to make “method chaining” read better. For example, if the `foo` method returns an object, you can write

``````  obj#foo "bar" #baz
``````

to call the `baz` method, rather than needing

``````  (obj#foo "bar")#baz
``````

(I originally wrote this for use with the `jQuery` binding for `ocamljs`; method chaining is common with `jQuery`.)

Here is the extension:

``````  open Camlp4

module Id : Sig.Id =
struct
let name = "pa_jquery"
let version = "0.1"
end

module Make (Syntax : Sig.Camlp4Syntax) =
struct
open Sig
include Syntax

DELETE_RULE Gram expr: SELF; "#"; label END;

EXTEND Gram
expr: BEFORE "apply"
[ "#" LEFTA
[ e = SELF; "#"; lab = label ->
<:expr< \$e\$ # \$lab\$ >> ]
];
END
end

module M = Register.OCamlSyntaxExtension(Id)(Make)
``````

To make sense of a syntax extension it’s helpful to refer to `Camlp4OCamlRevisedParser.ml` (which defines the revised syntax grammar) and `Camlp4OCamlParser.ml` (which defines the original syntax as an extension of the revised syntax). There we see that the `#` operator is parsed in the `expr` entry, in a level called ”`.`” (which includes other dereferencing operators), and that this level appears below the `apply` level, which parses function application. Recall from the parsing post that operators in lower levels bind more tightly. So to get the effect we want, we need to move the `#` rule above the `apply` level in the grammar.

First we delete the rule from its original location: `DELETE_RULE` takes the grammar, the entry, and the symbols on the left-hand side of the rule, followed by `END`; you don’t have to say in what level it appears. Then we add the rule at a new location: we create a new level `#` containing the rule from the original grammar, and add it before the level named `apply`.

There are several ways to specify where a level is inserted: `BEFORE` level and `AFTER` level put it before or after some other level; `LEVEL` level adds rules to an existing level (you will be warned but not stopped from changing the label or associativity of the level); `FIRST` and `LAST` put the level before or after all other levels. If you don’t specify, rules are added to the topmost level in the entry. The resulting grammar works just as if you had given it all at once, making the insertions in the specified places. (However, it is not very clear from the code how ordering works when inserting rules into an existing level; it is perhaps best not to rely on the order of rules in a level anyway.)

Finally we register the extension. The `Make` argument to `OCamlSyntaxExtension` returns a `Sig.Camlp4Syntax` for some reason (in `Register.ml` it is just ignored) so we `include Syntax` to provide it.

(The complete code for this example is here.)

Transforming the AST

Let’s do a slightly more complicated example involving some transformation of the parsed AST. It often comes up that we want to `let`-bind the value of an expression to a name, trapping exceptions, then evaluate the body of the `let` outside the scope of the exception handler. This is a bit painful to write in stock OCaml; we can only straightforwardly express trapping exceptions in the whole `let` expression:

``````  try let x = e1 in e2
with e -> h
``````

A nice alternative is to use thunks to delay the evaluation of the body, doing it outside the scope of the `try`/`with`:

``````  (try let x = e1 in fun () -> e2
with e -> fun () -> h)()
``````

(We must thunkify the exception handler to make the types work out.) This is simple enough to do by hand, but let’s give it some syntactic sugar:

``````  let try x = e1 in e2
with e -> h
``````

which should expand to the thunkified version above. (The idea and syntax are taken from Martin Jambon’s micmatch extension.)

Let’s look at the existing rules in `Camlp4OCamlRevisedParser.ml` for `let` and `try` to get an idea of how to parse the `let`/`try` form:

``````  [ "let"; r = opt_rec; bi = binding; "in"; x = SELF ->
<:expr< let \$rec:r\$ \$bi\$ in \$x\$ >>
...
| "try"; e = sequence; "with"; a = match_case ->
<:expr< try \$mksequence' _loc e\$ with [ \$a\$ ] >>
``````

For `let`, the `opt_rec` entry parses an optional `rec` keyword (we see there is a special antiquotation for interpolating `rec`). `Binding` parses a group of bindings separated by `and`. `SELF` is just `expr`. For `try`, `sequence` is a sequence of expressions separated by `;`, and `match_case` is a group of match cases separated by `|`. (These entries are both a little different in the original syntax, to account for the different semicolon rules and the `[]` delimiters around the match cases.) Recall that `Camlp4OCamlRevisedParser.ml` uses the revised syntax quotations, so we have `[]` around the match cases. The call to `mksequence'` just wraps a `do {}` around a sequence if necessary; more on this below.

The parsing rule we want is a combination of these. Here is the extension:

``````  EXTEND Gram
expr: LEVEL "top" [
[ "let"; "try"; r = opt_rec; bi = binding; "in";
e = sequence; "with"; a = match_case ->
let a =
List.map
(function
| <:match_case< \$p\$ when \$w\$ -> \$e\$ >> ->
<:match_case<
\$p\$ when \$w\$ -> fun () -> \$e\$
>>
| mc -> mc)
(Ast.list_of_match_case a []) in
<:expr<
(try let \$rec:r\$ \$bi\$ in fun () -> do { \$e\$ }
with [ \$list:a\$ ])()
>>
]
];
END
``````

We put `rec` after `try` (following `micmatch`), which is a little weird , but if we put it before we would need to look ahead to disambiguate `let` from `let try`; once we parse `opt_rec` we are committed to one rule or the other ; instead we could start the rule `"let"; r = opt_rec; "try"`, which has no ambiguity with the ordinary `let` rule because the `"let"; opt_rec` prefix is factored out; the parser doesn’t choose between the rules until it tries to parse `try`. After `in` we parse `sequence` rather than `SELF`; this seems like a good choice because there is a `with` to end the sequence.

Now, to transform the AST, we map over the match cases. The `match_case` entry returns a list of cases separated by `Ast.McOr`; we call `list_of_match_case` to get an ordinary list. For each case, we match the pattern, `when` clause, and expression on the right-hand side (these are packaged in an `Ast.McArr`, where the `when` clause field is `Ast.ExNil` if there is no `when` clause), and return it with the expression thunkified. Then we return the whole `let` inside `try`, with the body sequence thunkified.

We have to add a `do {}` around the body, creating an `Ast.ExSeq` node, because that’s what is expected by `Camlp4Ast2OCamlAst.ml`—recall from the filters post that the Camlp4 AST is translated to an OCaml AST and marshalled to the compiler. If we forget this (and “we” often forget these idiosyncrasies) then we get the error ”`expr; expr: not allowed here, use do {...} or [|...|] to surround them`”, which is pretty helpful as these errors go.

(The complete code for this example is here.)

Extending pattern matching

As a final example, let’s extend OCaml’s pattern syntax. In the quotations post we noted that JSON quotations in a pattern are not very useful, because we would usually like a pattern to match even if the fields of an object come in a different order or there are extra fields. To keep the code short let’s abstract the problem a little and consider matching association lists: if we write a match case

``````  | alist [ "foo", x; "bar", y ] -> e
``````

we would like it to match association lists with `"foo"` and `"bar"` keys, in any order, with any extra pairs in the list. Our translation looks like this:

``````  | __pa_alist_patt_1 when
(match ((try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None),
(try Some (List.assoc "bar" __pa_alist_patt_1)
with | Not_found -> None))
with
| (Some x, Some y) -> true
| _ -> false)
->
(match ((try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None),
(try Some (List.assoc "bar" __pa_alist_patt_1)
with | Not_found -> None))
with
| (Some x, Some y) -> e
| _ -> assert false)
``````

This might seem overcomplicated, and it is true that we could simplify it for this case. But the built-in pattern syntax is complicated, and it is tricky handling all the cases to make things work smoothly; the strategy that produces the code above will handle some (but not all) of the complications. (We’ll consider some improvements below.)

The basic idea is that when we come to an `alist` we replace it with a new fresh name, then do further matching in a `when` clause, so if it fails we can continue to the next case by returning `false`. In the `when` clause we look up the keys, putting them in `option`s, then match on the `option`s; we handle nested patterns (to the right of a key) by embedding them in the `when` clause match. The `when` clause match also binds variables appearing in the original pattern, so they are available to the `when` clause of the original case (if there is one). Finally, we do the whole thing over again in the match case body to provide bindings to the original body.

In order to implement this we’ll use both a syntax extension and a filter. The reason is that we’d like to extend the `patt` entry, but to do the AST transformation we sketched above we need to transform `match_case`s. We could replace the `match_case` part of the parser as well but that seems needlessly hairy, and generally when writing a syntax extension we’d like to touch as little of the parser as possible so it interoperates well with other extensions.

First, here is the syntax extension:

``````  EXTEND Gram
patt: LEVEL "simple"
[[
"alist"; "[";
l =
LIST0
[ e = expr LEVEL "simple"; ",";
p = patt LEVEL "simple" ->
Ast.PaOlbi (_loc, "", p, e) ]
SEP ";";
"]" ->
<:patt< \$uid:"alist"\$ \$Ast.paSem_of_list l\$ >>
]];
END
``````

We extend the `simple` level of the `patt` entry, which parses primitive patterns. Inside `alist []` we parse a list of `expr` / `patt` pairs; we parse `expr` at the `simple` level or else it would parse the whole pair as an `expr`, and the same for `patt` just in case. Then we return the pair of expression and pattern in an `Ast.PaOlbi` (ordinarily used for optional argument defaults in function definitions). Why? Well, we need to return something of type `patt`, but we need somehow to get the `expr` to our filter, and this is the only `patt` constructor that holds an `expr`. (As an alternative we could parse a `patt` instead of an `expr`, but then we’d need to translate it to an `expr` at the point we use it.) Finally we return the list wrapped in a data constructor so we can recognize it easily in the filter; because it is lower-case, we can be sure that “alist” is not the identifier of a real data constructor.

Now let’s look at the filter. First, some helper functions:

``````  let fresh =
let id = ref 0 in
fun () ->
incr id;
"__pa_alist_patt_"  ^ string_of_int !id

let expr_tup_of_list _loc = function
| [] -> <:expr< () >>
| [ v ] -> v
| vs -> <:expr< \$tup:Ast.exCom_of_list vs\$ >>

let patt_tup_of_list _loc = function
| [] -> <:patt< () >>
| [ p ] -> p
| ps -> <:patt< \$tup:Ast.paCom_of_list ps\$ >>
``````

We have a function to generate fresh names, a function to turn a list of expressions into a tuple, and a similar function for patterns. The reason we need these latter two is that a tuple with 0 or 1 elements is not allowed by `Camlp4Ast2OCamlAst.ml` (the empty “tuple” is actually a special identifier in the Camlp4 AST). Next, the main rewrite function:

``````  let rewrite _loc p w e =
let k = ref (fun s f -> s) in
``````

The function takes the parts of an `Ast.McArr` (that is, a match case). We’re going to map over the pattern `p`, building up a function `k` as we encounter nested `alist` forms. We want to generate the same matching code in the `when` clause and the body, so `k` is parameterized with an expression in case of success (the original `when` clause or the body) and in case of failure (`false` or ```assert false```). We will build `k` from the inside out, starting with a function that just returns the success expression.

``````    let map =
object
inherit Ast.map as super

method patt p =
match super#patt p with
| <:patt< \$uid:"alist"\$ \$l\$ >> ->
let id = fresh () in
let l =
List.map
(function
| Ast.PaOlbi (_, _, p, e) -> p, e
| _ -> assert false)
(Ast.list_of_patt l []) in
let vs =
List.map
(fun (_, e) ->
<:expr<
try Some (List.assoc \$e\$ \$lid:id\$)
with Not_found -> None
>>)
l in
let ps =
List.map
(fun (p, _) -> <:patt< Some \$p\$ >>)
l in
let k' = !k in
k :=
(fun s f ->
<:expr<
match \$expr_tup_of_list _loc vs\$ with
| \$patt_tup_of_list _loc ps\$ -> \$k' s f\$
| _ -> \$f\$
>>);
<:patt< \$lid:id\$ >>
| p -> p
end in
``````

The `Ast.map` object provides methods to map each syntactic class of the AST, along with default implementations which return the node unchanged. We extend it to walk over the pattern, leaving it unchanged except when we come to our special `alist` constructor. In that case we generate a fresh name, which we return as the value of the function. Then we extract the `expr` / `patt` pairs and map them to `try Some (List.assoc ...` expressions and `Some` patterns. Finally we extend `k` by matching all the expressions against all the patterns; if the match succeeds we call the previous `k`, otherwise the failure expression. Since we build `k` from the inside out, we transform subpatterns first (by matching over `super#patt p`).

``````    let p = map#patt p in
let w = match w with <:expr< >> -> <:expr< true >> | _ -> w in
let w = !k w <:expr< false >> in
let e = !k e <:expr< assert false >> in
<:match_case< \$p\$ when \$w\$ -> \$e\$ >>
``````

We call `map#patt` on `p` to replace special `alist` constructor nodes with fresh names and build up `k`, then call the resulting `k` on the `when` clause (if there is no `when` clause we replace it with `true`) and body, and finally return the result as a `match_case`, completing the rewrite function.

``````  let filter =
let map =
object
inherit Ast.map as super

method match_case mc =
match super#match_case mc with
| <:match_case@_loc< \$p\$ when \$w\$ -> \$e\$ >> ->
rewrite _loc p w e
| e -> e
end in
map#str_item

let _ = AstFilters.register_str_item_filter filter
``````

We extend `Ast.map` again to call the rewrite function on each `match_case`, then register the resulting filter.

The code above handles `when` clauses and nested `alist` patterns, and interacts properly with ordinary OCaml patterns. However, it completely falls down on nested pattern alternatives. If we write

``````match x with
| alist [ "foo", f ]
| alist [ "fooz", f ] -> e
``````

we get this mess:

``````  | __pa_alist_patt_1 | __pa_alist_patt_2 when
(match try Some (List.assoc "fooz" __pa_alist_patt_2)
with | Not_found -> None
with
| Some f ->
(match try Some (List.assoc "foo" __pa_alist_patt_1)
with | Not_found -> None
with
| Some f -> true
| _ -> false)
| _ -> false)
->
... (* the same mess for the body *)
``````

The first problem is that both arms of an alternative must bind the same names, but we have replaced them with two different fresh names. The second problem is that we have blindly treated alternative `alist` patterns as being nested one inside the other. A solution to both these problems is to split nested alternatives into separate cases, at the cost of duplicating the `when` clause and body in each.

Jeremy Yallop’s patterns framework (see here for an update that works with OCaml 3.12.0) allows multiple pattern extensions to coexist, and provides some common facilities to make them easier to write. In particular it splits nested alternatives into separate cases. Another deficiency in the code above is that it duplicates the match expression (built in `k`) in the `when` clause and body. This can be avoided by computing the body within the `when` clause, setting a reference, and dereferencing it in the body. However, the reference must be bound outside the `match_case` to be visible both in the `when` clause and the body, so this approach must transform each AST node that contains `match_case`s in order to bind the refs in the right place. The `patterns` framework handles this as well.

(The complete code for this example is here. A version using the `patterns` framework is here.)

## Thursday, August 26, 2010

### ocamljs 0.3

I am happy to announce version 0.3 of ocamljs. Ocamljs is a system for compiling OCaml to Javascript. It includes a Javascript back-end for the OCaml compiler, as well as several support libraries, such as bindings to the browser DOM. Ocamljs also works with orpc for RPC over HTTP, and froc for functional reactive browser programming.

Changes since version 0.2 include:

• support for OCaml 3.11.x and 3.12.0
• jQuery binding (contributed by Dave Benjamin)
• full support for OCaml objects (interoperable with Javascript objects)
• Lwt 2.x support
• ocamllex and ocamlyacc support
• better interoperability with Javascript
• many small fixes and improvements

Development of ocamljs has moved from Google Code to Github; see

Comparison to js_of_ocaml

Since I last did an `ocamljs` release, a new OCaml-to-Javascript system has arrived, `js_of_ocaml`. I want to say a little about how the two systems compare:

`Ocamljs` is a back-end to the existing OCaml compiler; it translates the “lambda” intermediate language to Javascript. (This is also where the bytecode and native code back-ends connect to the common front-end.) `Js_of_ocaml` post-processes ordinary OCaml bytecode (compiled and linked with the ordinary OCaml bytecode compiler) into Javascript. With `ocamljs` you need a special installation of the compiler (and special support for `ocamlbuild` and `ocamlfind`), you need to recompile libraries, and you need the OCaml source to build it. With `js_of_ocaml` you don’t need any of this.

Since `ocamljs` recompiles libraries, it’s possible to special-case code for the Javascript build to take advantage of Javascript facilities. For example, `ocamljs` implements the `Buffer` module on top of Javascript arrays instead of strings, for better performance. Similarly, it implements `CamlinternalOO` to use Javascript method dispatch directly instead of layering OCaml method dispatch on top. `Js_of_ocaml` can’t do this (or at least it would be necessary to recognize the compiled bytecode and replace it with the special case).

Because `js_of_ocaml` works from bytecode, it can’t always know the type of values (at the bytecode level, `int`s, `bool`s, and `char`s all have the same representation, for example). This makes interoperating with native Javascript more difficult: you usually need conversion functions between the OCaml and Javascript representation of values when you call a Javascript function from OCaml. `Ocamljs` has more information to work with, and can represent OCaml bools as Javascript bools, for example, so you can usually call a Javascript function from OCaml without conversions.

`Ocamljs` has a mixed representation of strings: literal strings and the result of `^`, `Buffer.contents`, and `Printf.sprintf` are all immutable Javascript strings; strings created with `String.create` are mutable strings implemented by Javascript arrays (with a `toString` method which returns the represented string). This is good for interoperability—you can usually pass a string directly to Javascript—but it doesn’t match regular OCaml’s semantics, and it can cause runtime failures (e.g. if you try to mutate an immutable string). `Js_of_ocaml` implements only mutable strings, so you need conversions when calling Javascript, but the semantics match regular OCaml.

With `ocamljs`, Javascript objects can be called from OCaml using the ordinary OCaml method-call syntax, and objects written in OCaml can be called using the ordinary Javascript syntax. With `js_of_ocaml`, a special syntax is needed to call Javascript objects, and OCaml objects can’t easily be called from Javascript. However, there is an advantage to having a special call syntax: with `ocamljs` it is not possible to partially apply calls to native Javascript methods, but this is not caught by the compiler, so there can be a runtime failure.

`Ocamljs` supports inline Javascript, while `js_of_ocaml` does not. I think it might be possible for `js_of_ocaml` to do so using the same approach that `ocamljs` takes: use Camlp4 quotations to embed a syntax tree, then convert the syntax tree from its OCaml representation (as lambda code or bytecode) into Javascript. However, you would still need conversion functions between OCaml and Javascript values.

I haven’t compared the performance of the two systems. It seems like there must be a speed penalty to translating from bytecode compared to translating from lambda code. On the other hand, while `ocamljs` is very naive in its translation, `js_of_ocaml` makes several optimization passes. With many programs it doesn’t matter, since most of the time is spent in browser code. (For example, the `planet` example seems to run at the same speed in `ocamljs` and `js_of_ocaml`.) It would be interesting to compare them on something computationally intensive like Andrej Bauer’s random-art.org.

`Js_of_ocaml` is more complete and careful in its implementation of OCaml (e.g. it supports `int64`s), and it generates much more compact code than `ocamljs`. I hope to close the gap in these areas, possibly by borrowing some code and good ideas from `js_of_ocaml`.

## Friday, August 20, 2010

### Mixing monadic and direct-style code with delimited continuations

The Lwt library is a really nice way to write concurrent programs. A big downside, however, is that you can’t use direct-style libraries with it. Suppose we’re writing an XMPP server, and we want to parse XML as it arrives over a network connection, using Daniel Bünzli’s nice `xmlm` library. `Xmlm` can read from a `string`, or from a `Pervasives.in_channel`, or you can give it a function of type `(unit -> int)` to return the next character of input. But there is no way to have it read from an Lwt thread; that is, we can’t give it a function of type `(unit -> int Lwt.t)`, since it doesn’t know what do with an `Lwt.t`. To keep track of the parser state at the point the input blocks, the whole library would need to be rewritten in Lwt style (i.e. monadic style).

Now, Lwt does provide the `Lwt_preemptive` module, which gives you a way to spin off a preemptive thread (implemented as an ordinary OCaml thread) and wait for its result in the usual Lwt way with `bind`. This is useful, but has two drawbacks: preemptive threads are preemptive, so you’re back to traditional locking if you want to operate on shared data; and preemptive threads are threads, so they are much heavier than Lwt threads, and (continuing the XMPP hypothetical) it may not be feasible to use one per open connection.

Fibers

What we would really like is to be able spin off a cooperative, direct-style thread. The thread needs a way to block on Lwt threads, but when it blocks we need to be able to schedule another Lwt thread. As a cooperative thread it of course has exclusive access to the process state while it is running. A cooperative, direct-style thread is sometimes called a coroutine (although to me that word connotes a particular style of inter-thread communication as well, where values are yielded between coroutines), or a fiber.

Here’s an API for mixing Lwt threads with fibers:

``````  val start : (unit -> 'a) -> 'a Lwt.t
val await : 'a Lwt.t -> 'a
``````

The `start` function spins off a fiber, returning an Lwt thread which is woken with the result of the fiber once it completes. The `await` function (which may be called only from within a fiber) blocks on the result of an Lwt thread, allowing another Lwt thread to be scheduled while it is waiting.

With this API we could implement our XMPP server by calling `xmlm` from within a fiber, and passing it a function that `await`s the next character available on the network connection. But how do we implement it?

Delimited continuations

Oleg Kiselyov’s recent announcement of a native-code version of his `Delimcc` library for delimited continuations in OCaml reminded me of two things:

1. I should find out what delimited continuations are.
2. They sound useful for implementing fibers.

The paper describing the library, Delimited Control in OCaml, Abstractly and Concretely, has a pretty good overview of delimited continuations, and section 2 of A Monadic Framework for Delimited Continuations is helpful too.

The core API is small:

``````  type 'a prompt
type ('a,'b) subcont

val new_prompt   : unit -> 'a prompt

val push_prompt  : 'a prompt -> (unit -> 'a) -> 'a
val take_subcont :
'b prompt -> (('a,'b) subcont -> unit -> 'b) -> 'a
val push_subcont : ('a,'b) subcont -> (unit -> 'a) -> 'b
``````

I find it easiest to think about these functions as operations on the stack. A prompt is an identifier used to mark a point on the stack (the stack can be marked more than once with the same prompt). The function `new_prompt` makes a new prompt which is not equal to any other prompt.

The call `push_prompt p f` marks the stack with `p` then runs `f`, so the stack, growing to the right, looks like

```
ABCDpEFGH
```

where `ABCD` are stack frames in the continuation of the call to `push_prompt`, and `EFGH` are frames created while running `f`. If `f` returns normally (that is, without calling `take_subcont`) then its return value is returned by `push_prompt`, and we are back to the original stack `ABCD`.

If `take_subcont p g` is called while running `f`, the stack fragment `EFGH` is packaged up as an `('a,'b) subcont` and passed to `g`. You can think of an `('a,'b) subcont` as a function of type `'a -> 'b`, where `'a` is the return type of the call to `take_subcont` and `'b` is the return type of the call to `push_prompt`. `Take_subcont` removes the fragment `pEFGH` from the stack, and there are some new frames `IJKL` from running `g`, so we have

```
ABCDIJKL
```

Now `g` can make use of the passed-in `subcont` using `push_subcont`. (Thinking of a `subcont` as a function, `push_subcont` is just a weird function application operator, which takes the argument as a thunk). Then the stack becomes

```
ABCDIJKLEFGH
```

Of course `g` can call the `subcont` as many times as you like.

A common pattern is to re-mark the stack with `push_prompt` before calling `push_subcont` (so `take_subcont` may be called again). There is an optimized version of this combination called `push_delim_subcont`, which produces the stack

```
ABCDIJKLpEFGH
```

The idea that a `subcont` is a kind of function is realized by `shift0`, which is like `take_subcont` except that instead of passing a `subcont` to `g` it passes an ordinary function. The passed function just wraps a call to `push_delim_subcont`. (It is `push_delim_subcont` rather than `push_subcont` for historical reasons I think—see the Monadic Framework paper for a comparison of various delimited continuation primitives.)

Implementing fibers

To implement fibers, we want `start f` to mark the stack, then run `f`; and `await t` to unwind the stack back to the mark, wait for `t` to complete, then restore the stack. Here is `start`:

``````  let active_prompt = ref None

let start f =
let t, u = Lwt.wait () in
let p = Delimcc.new_prompt () in
active_prompt := Some p;

Delimcc.push_prompt p begin fun () ->
let r =
try Lwt.Return (f ())
with e -> Lwt.Fail e in
active_prompt := None;
match r with
| Lwt.Return v -> Lwt.wakeup u v
| Lwt.Fail e -> Lwt.wakeup_exn u e
| Lwt.Sleep -> assert false
end;
t
``````

We make a sleeping Lwt thread, and store a new prompt in a global (this is OK because we won’t yield control to another Lwt thread before using it; of course this is not safe with OCaml threads). Then we mark the stack with `push_prompt` and run the fiber. (The `let r = ... match r with ...` is to avoid calling `Lwt.wakeup{,_exn}` in the scope of the `try`; we use `Lwt.state` as a handy type to store either a result or an exception.) If the fiber completes without calling `await` then all we do is wake up the Lwt thread with the returned value or exception.

Here is `await`:

``````  let await t =
let p =
match !active_prompt with
| None -> failwith "await called outside start"
| Some p -> p in
active_prompt := None;

match Lwt.poll t with
| Some v -> v
| None ->
Delimcc.shift0 p begin fun k ->
active_prompt := Some p;
k ();
Lwt.return () in
end;
match Lwt.poll t with
| Some v -> v
| None -> assert false
``````

We first check to be sure that we are in the scope of `start`, and that `t` isn’t already completed (in which case we just return its result). If we actually need to wait for `t`, we call `shift0`, which capture the stack fragment back to the `push_prompt` call in `start` (this continuation includes the subsequent `match Lwt.poll t` and everything after the call to `await`), then `try_bind` so we can restore the stack fragment when `t` completes (whether by success or failure). When `t` completes, the `ready` function restores the global `active_prompt`, in case the fiber calls `await` again, then restores the stack by calling `k` (recall that this also re-marks the stack with `p`, which is needed if the fiber calls `await` again).

It’s pretty difficult to follow what’s going on here, so let’s try it with stacks. After calling `start` we have

```
ABCDpEFGH
```

where `ABCD` is the continuation of `push_prompt` in `start` (just the return of `t`) and `EFGH` are frames created by the thunk passed to `start`. Now, a call to `await` (on an uncompleted thread) calls `shift0`, which packs up `EFGH` as `k` and unwinds the stack to `p`. The function passed to `shift0` stores `k` in `ready` but doesn’t call it, and control returns to `start` (since the stack has been unwound).

The program continues normally until `t` completes. Now control is in `Lwt.run_waiters` running threads that were waiting on `t`; one of them is our `ready` function. When it is called, the stack is re-marked and `EFGH` is restored, so we have

```
QRSTpEFGH
```

where `QRST` is wherever we happen to be in the main program, ending in `Lwt.run_waiters`. Now, `EFGH` ends with the second call to `match Lwt.poll` in `await`, which returns the value of `t` and continues the thunk passed to `start`. The stack is now marked with `p` inside `Lwt.run_waiters`, so when `await` is called again control returns there.

We have seen that we can use fibers to write Lwt threads in direct style. Should we abandon Lwt’s monadic style entirely, and use Lwt only for its event handling?

First, how does each style perform? Every time a fiber blocks and resumes, we have to copy, unwind, and restore its entire stack. With Lwt threads, the “stack” is a bunch of linked closures in the heap, so we don’t need to do anything to block or resume. On the other hand, building and garbage-collecting the closures is more expensive than pushing and popping the stack. We can imagine that which style performs better depends on the thread: if it blocks infrequently enough, the amortized cost of copying and restoring the stack might be lower than the cost of building and garbage-collecting the closures. (We can also imagine that a different implementation of delimited continuations might change this tradeoff.)

Second, how does the code look? The paper Cooperative Task Management without Manual Stack Management considers this question in the context of the “events vs. threads” debate. Many of its points lose their force when translated to OCaml and Lwt—closures, the `>>=` operator, and Lwt’s syntax extension go a long way toward making Lwt code look like direct style—but some are still germane. In favor of fibers is that existing direct-style code need not be rewritten to work with Lwt (what motivated us in the first place). In favor of monadic style is that the type of a function reflects the possibility that it might block, yield control to another thread, and disturb state invariants.

Direct-style FRP

We could apply this idea, of replacing monadic style with direct style using delimited continuations, to other monads—in particular to the `froc` library for functional reactive programming. (The Scala.React FRP library also uses delimited continuations to implement direct style; see Deprecating the Observer Pattern for details.)

Here’s the API:

``````  val direct : (unit -> 'a) -> 'a Froc.behavior
val read : 'a Froc.behavior -> 'a
``````

Not surprisingly, it’s just the same as for Lwt, but with a different monad and different names (I don’t know if `direct` is quite right but it is better than `start`). There is already a function `Froc.sample` with the same type as `read`, but it has a different meaning: `sample` takes a snapshot of a behavior but creates no dependency on it.

The implementation is very similar as well:

``````  let active_prompt = ref None

let direct f =
let t, u = Froc_ddg.make_changeable () in
let p = Delimcc.new_prompt () in
active_prompt := Some p;

Delimcc.push_prompt p begin fun () ->
let r =
try Froc_ddg.Value (f ())
with e -> Froc_ddg.Fail e in
active_prompt := None;
Froc_ddg.write_result u r
end;
(Obj.magic t : _ Froc.behavior)
``````

This is essentially the same code as `start`, modulo the change of monad. However, some of the functions we need aren’t exported from `Froc`, so we need to use the underlying `Froc_ddg` module and magic the result at the end. `Froc_ddg.make_changeable` is the equivalent of `Lwt.wait`: it returns an “uninitialized” monadic value along with a writer for that value. We use `Froc_ddg.result` instead of `Lwt.state` to store a value or exception, and `Froc_ddg.write_result` instead of the pattern match and `Lwt.wakeup{,_exn}`.

``````
let p =
match !active_prompt with
| None -> failwith "read called outside direct"
| Some p -> p in
active_prompt := None;

Delimcc.shift0 p begin fun k ->
Froc.notify_result_b t begin fun _ ->
active_prompt := Some p;
k ()
end
end;
Froc.sample t
``````

And this is essentially the same code as `await`. A `Froc.behavior` always has a value, so we don’t poll it as we did with `Lwt.t`, but go straight to `shift0`. We have `Froc.try_bind` but it’s a little more compact to use use `notify_result_b`, which passes a `result`.

The similarity between these implementations suggests that we could use the same code to get a direct style version of any monad; we only need a way to create an uninitialized monadic value, then set it. The call to `Lwt.poll` in `await` is an optimization which we would have to forgo. (In both these examples we have a monad with failure, and `try_bind`, but we could do without it.)
A little googling turns up Andrzej Filinski’s paper Representing Monads, which reaches the same conclusion, with a lot more rigor. In that work `start`/`direct` are called `reify`, and `await`/`read` are called `reflect`. `Reflect` is close to the implementations above, but in `reify` the paper marks the stack inside a function passed to `bind` rather than creating an uninitialized monadic value and later setting it.
This makes sense—inside `bind` an uninitialized monadic value is created, then set from the result of the function passed to `bind`. So we are partially duplicating `bind` in the code above. If we mark the stack in the right place we should be able to use `bind` directly. It is hard to see how to make the details work out, however, since `Lwt.bind` and `Froc.bind` each have some cases where uninitialized values are not created.
(You can find the complete code for Lwt fibers here and direct-style `froc` here.)