Wednesday, November 24, 2010

Three uses for a binary heap

Lately I have been interviewing for jobs, so doing a lot of whiteboard programming, and binary heaps keep arising in the solutions to these interview problems. There is nothing new or remarkable about these applications (binary heaps and their uses are covered in any undergraduate algorithms class), but I thought I would write them down because they are cute, and in the hope that they might be useful to someone else who (like me) gets by most days as a working programmer with no algorithm fancier than quicksort or binary search.

Binary heaps

Here’s a signature for a binary heap module Heap:

module type OrderedType = 
  type t 
  val compare : t -> t -> int 
module type S = sig 
  type elt 
  type t 
  val make : unit -> t 
  val add : t -> elt -> unit 
  val peek_min : t -> elt option 
  val take_min : t -> elt 
  val size : t -> int 
module Make (O : OrderedType) : S with type elt = O.t 

We start with a signature for ordered types (following the Set and Map modules in the standard library), so we can provide a type-specific comparison function.

From an ordered type we can make a heap which supports adding elements, peeking the smallest element (None if there are no elements) without removing it, removing and returning the smallest element (raising Not_found if the heap is empty), and returning the number of elements.

We’ll work out the asymptotic running times of the algorithms below, so it will be useful to know that the worst-case running time of the add and take_min functions is O(log n) where n is the number of elements in the heap.

Finding the k smallest elements in a list

Here’s a simple one. To find the smallest element in a list, we could sort the list then take the first element in the sorted list, at a cost of O(log n). Or we could just take a pass over the list keeping a running minimum, at a cost of O(n).

What if we want the k smallest elements? Again, we could sort the list, but if k < n we can do better by generalizing the single-pass solution. The idea is to keep the k smallest elements we’ve seen so far in a binary heap. For each element in the list we add it to the heap, then (if there were already k elements in the heap) remove the largest element in the heap, leaving the k smallest.

The running time is O(n log k) since we do an add and a take_min in a heap of size k for each of n elements in the list. Here’s the code:

let kmin (type s) k l = 
  let module OT = struct 
    type t = s 
    let compare e1 e2 = compare e2 e1 
  end in 
  let module H = Heap.Make(OT) in 
  let h = H.make () in 
    (fun e -> 
       H.add h e; 
       if H.size h > k 
       then ignore (H.take_min h)) 
  let rec loop mins = 
    match H.peek_min h with 
      | None -> mins 
      | _ -> loop (H.take_min h :: mins) in 
  loop [] 

Here we make good use of OCaml 3.12’s new feature for explicitly naming type variables in a polymorphic function to make a structure matching OrderedType. The heap has the same element type as the list, but we reverse the comparison since we want to remove the largest rather than smallest element from the heap in the loop. At the end of kmin we drain the heap to build a list of the k smallest elements.

Merging k lists

Suppose we want to merge k lists. We could merge them pairwise until there is only one list, but that would take k - 1 passes, for a worst-case running time of O(n * (k - 1)). Instead we can merge them all in one pass, using a binary heap so we can find the next smallest element of k lists in O(log k) time, for a running time of O(n log k). Here’s the code:

let merge (type s) ls = 
  let module OT = struct 
    type t = s list 
    let compare e1 e2 = 
      compare (List.hd e1) (List.hd e2) 
  end in 
  let module H = Heap.Make(OT) in 
  let h = H.make () in 
  let add = function 
    | [] -> () 
    | l -> H.add h l in 
  List.iter add ls; 
  let rec loop () = 
    match H.peek_min h with 
      | None -> [] 
      | _ -> 
          match H.take_min h with 
            | [] -> assert false 
            | m :: t -> 
                add t; 
                m :: loop () in 
  loop () 

We store the lists in the heap, and compare them by comparing their head element (we’re careful not to put an empty list in the heap). When we take the smallest list from the heap, its head becomes the next element in the output list, and we return its tail (if it is not empty) to the heap.

Computing a skyline

The next problem was told to me in terms of computing the skyline of a set of buildings. A building has a height and a starting and ending x-coordinate; buildings may overlap. The skyline of a set of buildings is a list of (x, y) pairs (in ascending x order), describing a sequence of horizontal line segments (each starting at (x, y) and ending at the subsequent x), such that at any x there is no space between the line segment and the tallest building. (Here’s another description with diagrams.)

I googled a bit to see what this is useful for, and didn’t find much. One application is to extract a monophonic line from polyphonic music, where x is time and height is some metric on notes, like pitch or volume. It might be useful for searching data which is only intermittently applicable—say, to compute a schedule over time of the nearest open restaurant.

The algorithm scans the building start and end points in ascending x order, keeping the “active” buildings (those which overlap the current x) in a binary heap. The height of the skyline can only change at a building start or end point. We can determine the tallest building at a point by calling peek_min on the heap.

When we hit a start point we add the building to the heap; for an end point we do nothing (the heap has no operation to remove an element). So we may have inactive buildings in the heap. We remove them lazily—before checking the height of the highest building, we call take_min to remove any higher inactive buildings.

The worst-case running time is O(n log n), since we do some heap operations for each building, and we might end up with all the buildings in the heap.

Here’s the code:

type building = int * int * int (* x0, x1, h *) 
let skyline bs = 
  let module OT = struct 
    type t = int * building 
    let compare (x1, _) (x2, _) = compare x1 x2 
  end in 
  let module Events = Heap.Make(OT) in 
  let events = Events.make () in 
    (fun ((x0,x1,_) as b) -> 
       Events.add events (x0, b); 
       Events.add events (x1, b)) 
  let module OT = struct 
    type t = building 
    let compare (_,_,h1) (_,_,h2) = compare h2 h1 
  end in 
  let module Heights = Heap.Make(OT) in 
  let heights = Heights.make () in 
  let rec loop last = 
    match Events.peek_min events with 
      | None -> [] 
      | _ -> 
          let (x, (x0,_,h as b)) = Events.take_min events in 
          if x = x0 then Heights.add heights b; 
          while (match Heights.peek_min heights with 
                   | Some (_,x1,_) -> x1 <= x 
                   | _ -> false) do 
            ignore (Heights.take_min heights) 
          let h = 
            match Heights.peek_min heights with 
              | Some (_,_,h) -> h 
              | None -> 0 in 
          match last with 
            | Some h' when h = h' -> loop last 
            | _ -> (x, h) :: loop (Some h) in 
  loop None 

We use a second heap events to store the “events” (the start and end points of all the buildings), in order to process them in ascending x order. (This use is not dynamic—we do not add new elements to the heap while processing them—so we could just as well use another means of sorting the points.) In this heap we store the x coordinate and the building (we can tell whether we have a start or end point by comparing the x coordinate to the building’s start point), and compare elements by comparing just the x coordinates.

The main heap heights stores buildings, and we compare them by comparing heights (reversed, so peek_min peeks the tallest building). While there are still events, we add the building to heights if the event is a start point, clear out inactive buildings, then return the pair (x, y) where x is the point we’re processing and y is the height of the tallest active building. Additionally we filter out adjacent pairs with the same height; these can arise when a shorter building starts or ends while a taller building is active.

Implementing binary heaps

The following implementation is derived from the one in Daniel Bünzli’s React library (edited a little bit for readability). The Wikipedia article on binary heaps explains the standard technique well, so I won’t repeat it.

The only piece of trickiness is the use of Obj.magic 0 for unused elements of the array, so we can grow it by doubling the size rather than adding a single element each time, and thereby amortize the cost of blitting the old array.

module Make (O : OrderedType) : S with type elt = O.t = 
  type elt = O.t 
  type t = { mutable arr : elt array; mutable len : int } 
  let make () = { arr = [||]; len = 0; } 
  let compare h i1 i2 = h.arr.(i1) h.arr.(i2) 
  let swap h i1 i2 = 
    let t = h.arr.(i1) in 
    h.arr.(i1) <- h.arr.(i2); 
    h.arr.(i2) <- t 
  let rec up h i = 
    if i = 0 then () 
      let p = (i - 1) / 2 in 
      if compare h i p < 0 then begin 
        swap h i p; 
        up h p 
  let rec down h i = 
    let l = 2 * i + 1 in 
    let r = 2 * i + 2 in 
    if l >= h.len then () 
      let child = 
        if r >= h.len then l 
        else if compare h l r < 0 then l else r in 
      if compare h i child > 0 then begin 
        swap h i child; 
        down h child 
  let add h e = 
    if h.len = Array.length h.arr 
    then begin 
      let len = 2 * h.len + 1 in 
      let arr' = Array.make len (Obj.magic 0) in 
      Array.blit h.arr 0 arr' 0 h.len; 
      h.arr <- arr' 
    h.arr.(h.len) <- e; 
    up h h.len; 
    h.len <- h.len + 1 
  let peek_min h = 
    match h.len with 
      | 0 -> None 
      | _ -> Some h.arr.(0) 
  let take_min h = 
    match h.len with 
      | 0 -> raise Not_found 
      | 1 -> 
          let m = h.arr.(0) in 
          h.arr.(0) <- (Obj.magic 0); 
          h.len <- 0; 
      | k -> 
          let m = h.arr.(0) in 
          let k = k - 1 in 
          h.arr.(0) <- h.arr.(k); 
          h.arr.(k) <- (Obj.magic 0); 
          h.len <- k; 
          down h 0; 
  let size h = h.len 

(Complete code is here.)

Friday, September 10, 2010

Reading Camlp4, part 11: syntax extensions

In this final (?) post in my series on Camlp4, I want at last to cover syntax extensions. A nontrivial syntax extension involves almost all the topics we have previously covered, so it seems fitting that we treat them last.

Extending grammars

In the post on parsing we covered Camlp4 grammars but stopped short of explaining how to extend them. Well, this is not completely true: we used the EXTEND form to extend an empty grammar, and we can also use it to extend non-empty grammars. We saw a small example of this when implementing quotations, where we extended the JSON grammar with a new json_eoi entry (which refered to an entry in the original grammar). Rules and levels may also be added to existing entries, and rules may be deleted.

Let’s look at a complete syntax extension, which demonstrates modifying Camlp4’s OCaml grammar. The purpose of the extension is to change the precedence of the method call operator # to make “method chaining” read better. For example, if the foo method returns an object, you can write

  obj#foo "bar" #baz 

to call the baz method, rather than needing

  (obj#foo "bar")#baz 

(I originally wrote this for use with the jQuery binding for ocamljs; method chaining is common with jQuery.)

Here is the extension:

  open Camlp4 
  module Id : Sig.Id = 
    let name = "pa_jquery" 
    let version = "0.1" 
  module Make (Syntax : Sig.Camlp4Syntax) = 
    open Sig 
    include Syntax 
    DELETE_RULE Gram expr: SELF; "#"; label END; 
    EXTEND Gram 
      expr: BEFORE "apply" 
        [ "#" LEFTA 
          [ e = SELF; "#"; lab = label -> 
              <:expr< $e$ # $lab$ >> ] 
  module M = Register.OCamlSyntaxExtension(Id)(Make) 

To make sense of a syntax extension it’s helpful to refer to (which defines the revised syntax grammar) and (which defines the original syntax as an extension of the revised syntax). There we see that the # operator is parsed in the expr entry, in a level called ”.” (which includes other dereferencing operators), and that this level appears below the apply level, which parses function application. Recall from the parsing post that operators in lower levels bind more tightly. So to get the effect we want, we need to move the # rule above the apply level in the grammar.

First we delete the rule from its original location: DELETE_RULE takes the grammar, the entry, and the symbols on the left-hand side of the rule, followed by END; you don’t have to say in what level it appears. Then we add the rule at a new location: we create a new level # containing the rule from the original grammar, and add it before the level named apply.

There are several ways to specify where a level is inserted: BEFORE level and AFTER level put it before or after some other level; LEVEL level adds rules to an existing level (you will be warned but not stopped from changing the label or associativity of the level); FIRST and LAST put the level before or after all other levels. If you don’t specify, rules are added to the topmost level in the entry. The resulting grammar works just as if you had given it all at once, making the insertions in the specified places. (However, it is not very clear from the code how ordering works when inserting rules into an existing level; it is perhaps best not to rely on the order of rules in a level anyway.)

Finally we register the extension. The Make argument to OCamlSyntaxExtension returns a Sig.Camlp4Syntax for some reason (in it is just ignored) so we include Syntax to provide it.

(The complete code for this example is here.)

Transforming the AST

Let’s do a slightly more complicated example involving some transformation of the parsed AST. It often comes up that we want to let-bind the value of an expression to a name, trapping exceptions, then evaluate the body of the let outside the scope of the exception handler. This is a bit painful to write in stock OCaml; we can only straightforwardly express trapping exceptions in the whole let expression:

  try let x = e1 in e2 
  with e -> h 

A nice alternative is to use thunks to delay the evaluation of the body, doing it outside the scope of the try/with:

  (try let x = e1 in fun () -> e2 
   with e -> fun () -> h)() 

(We must thunkify the exception handler to make the types work out.) This is simple enough to do by hand, but let’s give it some syntactic sugar:

  let try x = e1 in e2 
  with e -> h 

which should expand to the thunkified version above. (The idea and syntax are taken from Martin Jambon’s micmatch extension.)

Let’s look at the existing rules in for let and try to get an idea of how to parse the let/try form:

  [ "let"; r = opt_rec; bi = binding; "in"; x = SELF -> 
      <:expr< let $rec:r$ $bi$ in $x$ >> 
  | "try"; e = sequence; "with"; a = match_case -> 
      <:expr< try $mksequence' _loc e$ with [ $a$ ] >> 

For let, the opt_rec entry parses an optional rec keyword (we see there is a special antiquotation for interpolating rec). Binding parses a group of bindings separated by and. SELF is just expr. For try, sequence is a sequence of expressions separated by ;, and match_case is a group of match cases separated by |. (These entries are both a little different in the original syntax, to account for the different semicolon rules and the [] delimiters around the match cases.) Recall that uses the revised syntax quotations, so we have [] around the match cases. The call to mksequence' just wraps a do {} around a sequence if necessary; more on this below.

The parsing rule we want is a combination of these. Here is the extension:

  EXTEND Gram 
    expr: LEVEL "top" [ 
      [ "let"; "try"; r = opt_rec; bi = binding; "in"; 
        e = sequence; "with"; a = match_case -> 
          let a = 
                 | <:match_case< $p$ when $w$ -> $e$ >> -> 
                       $p$ when $w$ -> fun () -> $e$ 
                 | mc -> mc) 
              (Ast.list_of_match_case a []) in 
            (try let $rec:r$ $bi$ in fun () -> do { $e$ } 
             with [ $list:a$ ])() 

We put rec after try (following micmatch), which is a little weird , but if we put it before we would need to look ahead to disambiguate `let` from `let try`; once we parse `opt_rec` we are committed to one rule or the other ; instead we could start the rule "let"; r = opt_rec; "try", which has no ambiguity with the ordinary let rule because the "let"; opt_rec prefix is factored out; the parser doesn’t choose between the rules until it tries to parse try. After in we parse sequence rather than SELF; this seems like a good choice because there is a with to end the sequence.

Now, to transform the AST, we map over the match cases. The match_case entry returns a list of cases separated by Ast.McOr; we call list_of_match_case to get an ordinary list. For each case, we match the pattern, when clause, and expression on the right-hand side (these are packaged in an Ast.McArr, where the when clause field is Ast.ExNil if there is no when clause), and return it with the expression thunkified. Then we return the whole let inside try, with the body sequence thunkified.

We have to add a do {} around the body, creating an Ast.ExSeq node, because that’s what is expected by—recall from the filters post that the Camlp4 AST is translated to an OCaml AST and marshalled to the compiler. If we forget this (and “we” often forget these idiosyncrasies) then we get the error ”expr; expr: not allowed here, use do {...} or [|...|] to surround them”, which is pretty helpful as these errors go.

(The complete code for this example is here.)

Extending pattern matching

As a final example, let’s extend OCaml’s pattern syntax. In the quotations post we noted that JSON quotations in a pattern are not very useful, because we would usually like a pattern to match even if the fields of an object come in a different order or there are extra fields. To keep the code short let’s abstract the problem a little and consider matching association lists: if we write a match case

  | alist [ "foo", x; "bar", y ] -> e 

we would like it to match association lists with "foo" and "bar" keys, in any order, with any extra pairs in the list. Our translation looks like this:

  | __pa_alist_patt_1 when 
      (match ((try Some (List.assoc "foo" __pa_alist_patt_1) 
               with | Not_found -> None), 
              (try Some (List.assoc "bar" __pa_alist_patt_1) 
               with | Not_found -> None)) 
       | (Some x, Some y) -> true 
       | _ -> false) 
      (match ((try Some (List.assoc "foo" __pa_alist_patt_1) 
               with | Not_found -> None), 
              (try Some (List.assoc "bar" __pa_alist_patt_1) 
               with | Not_found -> None)) 
       | (Some x, Some y) -> e 
       | _ -> assert false) 

This might seem overcomplicated, and it is true that we could simplify it for this case. But the built-in pattern syntax is complicated, and it is tricky handling all the cases to make things work smoothly; the strategy that produces the code above will handle some (but not all) of the complications. (We’ll consider some improvements below.)

The basic idea is that when we come to an alist we replace it with a new fresh name, then do further matching in a when clause, so if it fails we can continue to the next case by returning false. In the when clause we look up the keys, putting them in options, then match on the options; we handle nested patterns (to the right of a key) by embedding them in the when clause match. The when clause match also binds variables appearing in the original pattern, so they are available to the when clause of the original case (if there is one). Finally, we do the whole thing over again in the match case body to provide bindings to the original body.

In order to implement this we’ll use both a syntax extension and a filter. The reason is that we’d like to extend the patt entry, but to do the AST transformation we sketched above we need to transform match_cases. We could replace the match_case part of the parser as well but that seems needlessly hairy, and generally when writing a syntax extension we’d like to touch as little of the parser as possible so it interoperates well with other extensions.

First, here is the syntax extension:

  EXTEND Gram 
    patt: LEVEL "simple" 
       "alist"; "["; 
         l = 
             [ e = expr LEVEL "simple"; ","; 
               p = patt LEVEL "simple" -> 
                 Ast.PaOlbi (_loc, "", p, e) ] 
             SEP ";"; 
       "]" -> 
         <:patt< $uid:"alist"$ $Ast.paSem_of_list l$ >> 

We extend the simple level of the patt entry, which parses primitive patterns. Inside alist [] we parse a list of expr / patt pairs; we parse expr at the simple level or else it would parse the whole pair as an expr, and the same for patt just in case. Then we return the pair of expression and pattern in an Ast.PaOlbi (ordinarily used for optional argument defaults in function definitions). Why? Well, we need to return something of type patt, but we need somehow to get the expr to our filter, and this is the only patt constructor that holds an expr. (As an alternative we could parse a patt instead of an expr, but then we’d need to translate it to an expr at the point we use it.) Finally we return the list wrapped in a data constructor so we can recognize it easily in the filter; because it is lower-case, we can be sure that “alist” is not the identifier of a real data constructor.

Now let’s look at the filter. First, some helper functions:

  let fresh = 
    let id = ref 0 in 
    fun () -> 
      incr id; 
      "__pa_alist_patt_"  ^ string_of_int !id 
  let expr_tup_of_list _loc = function 
    | [] -> <:expr< () >> 
    | [ v ] -> v 
    | vs -> <:expr< $tup:Ast.exCom_of_list vs$ >> 
  let patt_tup_of_list _loc = function 
    | [] -> <:patt< () >> 
    | [ p ] -> p 
    | ps -> <:patt< $tup:Ast.paCom_of_list ps$ >> 

We have a function to generate fresh names, a function to turn a list of expressions into a tuple, and a similar function for patterns. The reason we need these latter two is that a tuple with 0 or 1 elements is not allowed by (the empty “tuple” is actually a special identifier in the Camlp4 AST). Next, the main rewrite function:

  let rewrite _loc p w e = 
    let k = ref (fun s f -> s) in 

The function takes the parts of an Ast.McArr (that is, a match case). We’re going to map over the pattern p, building up a function k as we encounter nested alist forms. We want to generate the same matching code in the when clause and the body, so k is parameterized with an expression in case of success (the original when clause or the body) and in case of failure (false or assert false). We will build k from the inside out, starting with a function that just returns the success expression.

    let map = 
        inherit as super 
        method patt p = 
          match super#patt p with 
            | <:patt< $uid:"alist"$ $l$ >> -> 
                let id = fresh () in 
                let l = 
                       | Ast.PaOlbi (_, _, p, e) -> p, e 
                       | _ -> assert false) 
                    (Ast.list_of_patt l []) in 
                let vs = 
                    (fun (_, e) -> 
                         try Some (List.assoc $e$ $lid:id$) 
                         with Not_found -> None 
                    l in 
                let ps = 
                    (fun (p, _) -> <:patt< Some $p$ >>) 
                    l in 
                let k' = !k in 
                k := 
                  (fun s f -> 
                       match $expr_tup_of_list _loc vs$ with 
                         | $patt_tup_of_list _loc ps$ -> $k' s f$ 
                         | _ -> $f$ 
                <:patt< $lid:id$ >> 
            | p -> p 
      end in 

The object provides methods to map each syntactic class of the AST, along with default implementations which return the node unchanged. We extend it to walk over the pattern, leaving it unchanged except when we come to our special alist constructor. In that case we generate a fresh name, which we return as the value of the function. Then we extract the expr / patt pairs and map them to try Some (List.assoc ... expressions and Some patterns. Finally we extend k by matching all the expressions against all the patterns; if the match succeeds we call the previous k, otherwise the failure expression. Since we build k from the inside out, we transform subpatterns first (by matching over super#patt p).

    let p = map#patt p in 
    let w = match w with <:expr< >> -> <:expr< true >> | _ -> w in 
    let w = !k w <:expr< false >> in 
    let e = !k e <:expr< assert false >> in 
    <:match_case< $p$ when $w$ -> $e$ >> 

We call map#patt on p to replace special alist constructor nodes with fresh names and build up k, then call the resulting k on the when clause (if there is no when clause we replace it with true) and body, and finally return the result as a match_case, completing the rewrite function.

  let filter = 
    let map = 
        inherit as super 
        method match_case mc = 
          match super#match_case mc with 
            | <:match_case@_loc< $p$ when $w$ -> $e$ >> -> 
                rewrite _loc p w e 
            | e -> e 
      end in 
  let _ = AstFilters.register_str_item_filter filter 

We extend again to call the rewrite function on each match_case, then register the resulting filter.

The code above handles when clauses and nested alist patterns, and interacts properly with ordinary OCaml patterns. However, it completely falls down on nested pattern alternatives. If we write

match x with 
  | alist [ "foo", f ] 
  | alist [ "fooz", f ] -> e 

we get this mess:

  | __pa_alist_patt_1 | __pa_alist_patt_2 when 
      (match try Some (List.assoc "fooz" __pa_alist_patt_2) 
             with | Not_found -> None 
       | Some f -> 
           (match try Some (List.assoc "foo" __pa_alist_patt_1) 
                  with | Not_found -> None 
            | Some f -> true 
            | _ -> false) 
       | _ -> false) 
      ... (* the same mess for the body *) 

The first problem is that both arms of an alternative must bind the same names, but we have replaced them with two different fresh names. The second problem is that we have blindly treated alternative alist patterns as being nested one inside the other. A solution to both these problems is to split nested alternatives into separate cases, at the cost of duplicating the when clause and body in each.

Jeremy Yallop’s patterns framework (see here for an update that works with OCaml 3.12.0) allows multiple pattern extensions to coexist, and provides some common facilities to make them easier to write. In particular it splits nested alternatives into separate cases. Another deficiency in the code above is that it duplicates the match expression (built in k) in the when clause and body. This can be avoided by computing the body within the when clause, setting a reference, and dereferencing it in the body. However, the reference must be bound outside the match_case to be visible both in the when clause and the body, so this approach must transform each AST node that contains match_cases in order to bind the refs in the right place. The patterns framework handles this as well.

(The complete code for this example is here. A version using the patterns framework is here.)

Thursday, August 26, 2010

ocamljs 0.3

I am happy to announce version 0.3 of ocamljs. Ocamljs is a system for compiling OCaml to Javascript. It includes a Javascript back-end for the OCaml compiler, as well as several support libraries, such as bindings to the browser DOM. Ocamljs also works with orpc for RPC over HTTP, and froc for functional reactive browser programming.

Changes since version 0.2 include:

  • support for OCaml 3.11.x and 3.12.0
  • jQuery binding (contributed by Dave Benjamin)
  • full support for OCaml objects (interoperable with Javascript objects)
  • Lwt 2.x support
  • ocamllex and ocamlyacc support
  • better interoperability with Javascript
  • many small fixes and improvements

Development of ocamljs has moved from Google Code to Github; see

Comparison to js_of_ocaml

Since I last did an ocamljs release, a new OCaml-to-Javascript system has arrived, js_of_ocaml. I want to say a little about how the two systems compare:

Ocamljs is a back-end to the existing OCaml compiler; it translates the “lambda” intermediate language to Javascript. (This is also where the bytecode and native code back-ends connect to the common front-end.) Js_of_ocaml post-processes ordinary OCaml bytecode (compiled and linked with the ordinary OCaml bytecode compiler) into Javascript. With ocamljs you need a special installation of the compiler (and special support for ocamlbuild and ocamlfind), you need to recompile libraries, and you need the OCaml source to build it. With js_of_ocaml you don’t need any of this.

Since ocamljs recompiles libraries, it’s possible to special-case code for the Javascript build to take advantage of Javascript facilities. For example, ocamljs implements the Buffer module on top of Javascript arrays instead of strings, for better performance. Similarly, it implements CamlinternalOO to use Javascript method dispatch directly instead of layering OCaml method dispatch on top. Js_of_ocaml can’t do this (or at least it would be necessary to recognize the compiled bytecode and replace it with the special case).

Because js_of_ocaml works from bytecode, it can’t always know the type of values (at the bytecode level, ints, bools, and chars all have the same representation, for example). This makes interoperating with native Javascript more difficult: you usually need conversion functions between the OCaml and Javascript representation of values when you call a Javascript function from OCaml. Ocamljs has more information to work with, and can represent OCaml bools as Javascript bools, for example, so you can usually call a Javascript function from OCaml without conversions.

Ocamljs has a mixed representation of strings: literal strings and the result of ^, Buffer.contents, and Printf.sprintf are all immutable Javascript strings; strings created with String.create are mutable strings implemented by Javascript arrays (with a toString method which returns the represented string). This is good for interoperability—you can usually pass a string directly to Javascript—but it doesn’t match regular OCaml’s semantics, and it can cause runtime failures (e.g. if you try to mutate an immutable string). Js_of_ocaml implements only mutable strings, so you need conversions when calling Javascript, but the semantics match regular OCaml.

With ocamljs, Javascript objects can be called from OCaml using the ordinary OCaml method-call syntax, and objects written in OCaml can be called using the ordinary Javascript syntax. With js_of_ocaml, a special syntax is needed to call Javascript objects, and OCaml objects can’t easily be called from Javascript. However, there is an advantage to having a special call syntax: with ocamljs it is not possible to partially apply calls to native Javascript methods, but this is not caught by the compiler, so there can be a runtime failure.

Ocamljs supports inline Javascript, while js_of_ocaml does not. I think it might be possible for js_of_ocaml to do so using the same approach that ocamljs takes: use Camlp4 quotations to embed a syntax tree, then convert the syntax tree from its OCaml representation (as lambda code or bytecode) into Javascript. However, you would still need conversion functions between OCaml and Javascript values.

I haven’t compared the performance of the two systems. It seems like there must be a speed penalty to translating from bytecode compared to translating from lambda code. On the other hand, while ocamljs is very naive in its translation, js_of_ocaml makes several optimization passes. With many programs it doesn’t matter, since most of the time is spent in browser code. (For example, the planet example seems to run at the same speed in ocamljs and js_of_ocaml.) It would be interesting to compare them on something computationally intensive like Andrej Bauer’s

Js_of_ocaml is more complete and careful in its implementation of OCaml (e.g. it supports int64s), and it generates much more compact code than ocamljs. I hope to close the gap in these areas, possibly by borrowing some code and good ideas from js_of_ocaml.

Friday, August 20, 2010

Mixing monadic and direct-style code with delimited continuations

The Lwt library is a really nice way to write concurrent programs. A big downside, however, is that you can’t use direct-style libraries with it. Suppose we’re writing an XMPP server, and we want to parse XML as it arrives over a network connection, using Daniel Bünzli’s nice xmlm library. Xmlm can read from a string, or from a Pervasives.in_channel, or you can give it a function of type (unit -> int) to return the next character of input. But there is no way to have it read from an Lwt thread; that is, we can’t give it a function of type (unit -> int Lwt.t), since it doesn’t know what do with an Lwt.t. To keep track of the parser state at the point the input blocks, the whole library would need to be rewritten in Lwt style (i.e. monadic style).

Now, Lwt does provide the Lwt_preemptive module, which gives you a way to spin off a preemptive thread (implemented as an ordinary OCaml thread) and wait for its result in the usual Lwt way with bind. This is useful, but has two drawbacks: preemptive threads are preemptive, so you’re back to traditional locking if you want to operate on shared data; and preemptive threads are threads, so they are much heavier than Lwt threads, and (continuing the XMPP hypothetical) it may not be feasible to use one per open connection.


What we would really like is to be able spin off a cooperative, direct-style thread. The thread needs a way to block on Lwt threads, but when it blocks we need to be able to schedule another Lwt thread. As a cooperative thread it of course has exclusive access to the process state while it is running. A cooperative, direct-style thread is sometimes called a coroutine (although to me that word connotes a particular style of inter-thread communication as well, where values are yielded between coroutines), or a fiber.

Here’s an API for mixing Lwt threads with fibers:

  val start : (unit -> 'a) -> 'a Lwt.t 
  val await : 'a Lwt.t -> 'a 

The start function spins off a fiber, returning an Lwt thread which is woken with the result of the fiber once it completes. The await function (which may be called only from within a fiber) blocks on the result of an Lwt thread, allowing another Lwt thread to be scheduled while it is waiting.

With this API we could implement our XMPP server by calling xmlm from within a fiber, and passing it a function that awaits the next character available on the network connection. But how do we implement it?

Delimited continuations

Oleg Kiselyov’s recent announcement of a native-code version of his Delimcc library for delimited continuations in OCaml reminded me of two things:

  1. I should find out what delimited continuations are.
  2. They sound useful for implementing fibers.

The paper describing the library, Delimited Control in OCaml, Abstractly and Concretely, has a pretty good overview of delimited continuations, and section 2 of A Monadic Framework for Delimited Continuations is helpful too.

The core API is small:

  type 'a prompt 
  type ('a,'b) subcont 
  val new_prompt   : unit -> 'a prompt 
  val push_prompt  : 'a prompt -> (unit -> 'a) -> 'a 
  val take_subcont : 
    'b prompt -> (('a,'b) subcont -> unit -> 'b) -> 'a 
  val push_subcont : ('a,'b) subcont -> (unit -> 'a) -> 'b 

I find it easiest to think about these functions as operations on the stack. A prompt is an identifier used to mark a point on the stack (the stack can be marked more than once with the same prompt). The function new_prompt makes a new prompt which is not equal to any other prompt.

The call push_prompt p f marks the stack with p then runs f, so the stack, growing to the right, looks like


where ABCD are stack frames in the continuation of the call to push_prompt, and EFGH are frames created while running f. If f returns normally (that is, without calling take_subcont) then its return value is returned by push_prompt, and we are back to the original stack ABCD.

If take_subcont p g is called while running f, the stack fragment EFGH is packaged up as an ('a,'b) subcont and passed to g. You can think of an ('a,'b) subcont as a function of type 'a -> 'b, where 'a is the return type of the call to take_subcont and 'b is the return type of the call to push_prompt. Take_subcont removes the fragment pEFGH from the stack, and there are some new frames IJKL from running g, so we have


Now g can make use of the passed-in subcont using push_subcont. (Thinking of a subcont as a function, push_subcont is just a weird function application operator, which takes the argument as a thunk). Then the stack becomes


Of course g can call the subcont as many times as you like.

A common pattern is to re-mark the stack with push_prompt before calling push_subcont (so take_subcont may be called again). There is an optimized version of this combination called push_delim_subcont, which produces the stack


The idea that a subcont is a kind of function is realized by shift0, which is like take_subcont except that instead of passing a subcont to g it passes an ordinary function. The passed function just wraps a call to push_delim_subcont. (It is push_delim_subcont rather than push_subcont for historical reasons I think—see the Monadic Framework paper for a comparison of various delimited continuation primitives.)

Implementing fibers

To implement fibers, we want start f to mark the stack, then run f; and await t to unwind the stack back to the mark, wait for t to complete, then restore the stack. Here is start:

  let active_prompt = ref None 
  let start f = 
    let t, u = Lwt.wait () in 
    let p = Delimcc.new_prompt () in 
    active_prompt := Some p; 
    Delimcc.push_prompt p begin fun () -> 
      let r = 
        try Lwt.Return (f ()) 
        with e -> Lwt.Fail e in 
      active_prompt := None; 
      match r with 
        | Lwt.Return v -> Lwt.wakeup u v 
        | Lwt.Fail e -> Lwt.wakeup_exn u e 
        | Lwt.Sleep -> assert false 

We make a sleeping Lwt thread, and store a new prompt in a global (this is OK because we won’t yield control to another Lwt thread before using it; of course this is not safe with OCaml threads). Then we mark the stack with push_prompt and run the fiber. (The let r = ... match r with ... is to avoid calling Lwt.wakeup{,_exn} in the scope of the try; we use Lwt.state as a handy type to store either a result or an exception.) If the fiber completes without calling await then all we do is wake up the Lwt thread with the returned value or exception.

Here is await:

  let await t = 
    let p = 
      match !active_prompt with 
        | None -> failwith "await called outside start" 
        | Some p -> p in 
    active_prompt := None; 
    match Lwt.poll t with 
      | Some v -> v 
      | None -> 
          Delimcc.shift0 p begin fun k -> 
            let ready _ = 
              active_prompt := Some p; 
              k (); 
              Lwt.return () in 
            ignore (Lwt.try_bind (fun () -> t) ready ready) 
          match Lwt.poll t with 
            | Some v -> v 
            | None -> assert false 

We first check to be sure that we are in the scope of start, and that t isn’t already completed (in which case we just return its result). If we actually need to wait for t, we call shift0, which capture the stack fragment back to the push_prompt call in start (this continuation includes the subsequent match Lwt.poll t and everything after the call to await), then try_bind so we can restore the stack fragment when t completes (whether by success or failure). When t completes, the ready function restores the global active_prompt, in case the fiber calls await again, then restores the stack by calling k (recall that this also re-marks the stack with p, which is needed if the fiber calls await again).

It’s pretty difficult to follow what’s going on here, so let’s try it with stacks. After calling start we have


where ABCD is the continuation of push_prompt in start (just the return of t) and EFGH are frames created by the thunk passed to start. Now, a call to await (on an uncompleted thread) calls shift0, which packs up EFGH as k and unwinds the stack to p. The function passed to shift0 stores k in ready but doesn’t call it, and control returns to start (since the stack has been unwound).

The program continues normally until t completes. Now control is in Lwt.run_waiters running threads that were waiting on t; one of them is our ready function. When it is called, the stack is re-marked and EFGH is restored, so we have


where QRST is wherever we happen to be in the main program, ending in Lwt.run_waiters. Now, EFGH ends with the second call to match Lwt.poll in await, which returns the value of t and continues the thunk passed to start. The stack is now marked with p inside Lwt.run_waiters, so when await is called again control returns there.

Events vs. threads

We have seen that we can use fibers to write Lwt threads in direct style. Should we abandon Lwt’s monadic style entirely, and use Lwt only for its event handling?

First, how does each style perform? Every time a fiber blocks and resumes, we have to copy, unwind, and restore its entire stack. With Lwt threads, the “stack” is a bunch of linked closures in the heap, so we don’t need to do anything to block or resume. On the other hand, building and garbage-collecting the closures is more expensive than pushing and popping the stack. We can imagine that which style performs better depends on the thread: if it blocks infrequently enough, the amortized cost of copying and restoring the stack might be lower than the cost of building and garbage-collecting the closures. (We can also imagine that a different implementation of delimited continuations might change this tradeoff.)

Second, how does the code look? The paper Cooperative Task Management without Manual Stack Management considers this question in the context of the “events vs. threads” debate. Many of its points lose their force when translated to OCaml and Lwt—closures, the >>= operator, and Lwt’s syntax extension go a long way toward making Lwt code look like direct style—but some are still germane. In favor of fibers is that existing direct-style code need not be rewritten to work with Lwt (what motivated us in the first place). In favor of monadic style is that the type of a function reflects the possibility that it might block, yield control to another thread, and disturb state invariants.

Direct-style FRP

We could apply this idea, of replacing monadic style with direct style using delimited continuations, to other monads—in particular to the froc library for functional reactive programming. (The Scala.React FRP library also uses delimited continuations to implement direct style; see Deprecating the Observer Pattern for details.)

Here’s the API:

  val direct : (unit -> 'a) -> 'a Froc.behavior 
  val read : 'a Froc.behavior -> 'a 

Not surprisingly, it’s just the same as for Lwt, but with a different monad and different names (I don’t know if direct is quite right but it is better than start). There is already a function Froc.sample with the same type as read, but it has a different meaning: sample takes a snapshot of a behavior but creates no dependency on it.

The implementation is very similar as well:

  let active_prompt = ref None 
  let direct f = 
    let t, u = Froc_ddg.make_changeable () in 
    let p = Delimcc.new_prompt () in 
    active_prompt := Some p; 
    Delimcc.push_prompt p begin fun () -> 
      let r = 
        try Froc_ddg.Value (f ()) 
        with e -> Froc_ddg.Fail e in 
      active_prompt := None; 
      Froc_ddg.write_result u r 
    (Obj.magic t : _ Froc.behavior) 

This is essentially the same code as start, modulo the change of monad. However, some of the functions we need aren’t exported from Froc, so we need to use the underlying Froc_ddg module and magic the result at the end. Froc_ddg.make_changeable is the equivalent of Lwt.wait: it returns an “uninitialized” monadic value along with a writer for that value. We use Froc_ddg.result instead of Lwt.state to store a value or exception, and Froc_ddg.write_result instead of the pattern match and Lwt.wakeup{,_exn}.

  let read t = 
    let p = 
      match !active_prompt with 
        | None -> failwith "read called outside direct" 
        | Some p -> p in 
    active_prompt := None; 
    Delimcc.shift0 p begin fun k -> 
      Froc.notify_result_b t begin fun _ -> 
        active_prompt := Some p; 
        k () 
    Froc.sample t 

And this is essentially the same code as await. A Froc.behavior always has a value, so we don’t poll it as we did with Lwt.t, but go straight to shift0. We have Froc.try_bind but it’s a little more compact to use use notify_result_b, which passes a result.

Monadic reflection

The similarity between these implementations suggests that we could use the same code to get a direct style version of any monad; we only need a way to create an uninitialized monadic value, then set it. The call to Lwt.poll in await is an optimization which we would have to forgo. (In both these examples we have a monad with failure, and try_bind, but we could do without it.)

A little googling turns up Andrzej Filinski’s paper Representing Monads, which reaches the same conclusion, with a lot more rigor. In that work start/direct are called reify, and await/read are called reflect. Reflect is close to the implementations above, but in reify the paper marks the stack inside a function passed to bind rather than creating an uninitialized monadic value and later setting it.

This makes sense—inside bind an uninitialized monadic value is created, then set from the result of the function passed to bind. So we are partially duplicating bind in the code above. If we mark the stack in the right place we should be able to use bind directly. It is hard to see how to make the details work out, however, since Lwt.bind and Froc.bind each have some cases where uninitialized values are not created.

(You can find the complete code for Lwt fibers here and direct-style froc here.)

(revised 10/22)