Ambassador to the Computers

Logic programming in Scala, part 3: unification and state

2011-06-08T20:41:00.000-07:00

In this post I want to build on the backtracking logic monad we covered last time by adding unification, yielding an embedded DSL for Prolog-style logic programming.

Prolog

Here is a small Prolog example, the rough equivalent of List.contains in Scala:

  member(X, [X|T]). 
  member(X, [_|T]) :- member(X, T).

Member doesn’t return a boolean; instead it succeeds or fails (in the same way as the logic monad). The goal member(1, [1,2,3]) succeeds; the goal member(4, [1,2,3]) fails. (What happens for member(1, [1,1,3])?)

A Prolog predicate is defined by one or more clauses (each ending in a period), made up of a head (the predicate and arguments before the :-) and zero or more subgoals (goals after the :-, separated by commas; if there are no subgoals the :- is omitted). To solve a goal, we unify it (match it) with each clause head, then solve each subgoal in the clause. If a subgoal fails we backtrack and try the next matching head; if there is no matching head the goal fails. A goal may succeed more than once.

For member we have two clauses: the first says that member succeeds if X is the head of the list ([X|T] is the same as x::t in Scala); the second says that member succeeds if X is a member of the tail of the list, regardless of the head. There is no clause where the list is empty (written []); a goal with an empty list fails since there is no matching clause head.

Prolog unification is more expressive than pattern matching as found in Scala, OCaml, etc. Both sides of a unification may contain variables; unification attempts to instantiate them so that the two sides are equal. Variables are instantiated by terms, which themselves may contain variables; unification finds the most general instantiation which makes the sides equal.

As a small example of this expressivity, we can run member “backwards”: the goal member(X, [1,2,3]) succeeds once for each element of the list, with X bound to the element.

There is much more on Prolog and logic programming in Frank Pfenning’s course notes, which I recommend highly.

Unification

For each type we want to use in unification we’ll define a corresponding type of terms, which have the same structure as the underlying type but can also contain variables. These aren’t Scala variables (which of course can’t be stored in a data structure) but “existential variables”, or evars. Evars are just tags; computations will carry an environment mapping evars to terms, which may be updated after a successful unification.

import scala.collection.immutable.{Map,HashMap} 
 
class Evar[A](val name: String) 
object Evar { def apply[A](name: String) = new Evar[A](name) } 
 
trait Term[A] { 
  // invariant: on call to unify, this and t have e substituted 
  def unify(e: Env, t: Term[A]): Option[Env] 
 
  def occurs[B](v: Evar[B]): Boolean 
  def subst(e: Env): Term[A] 
  def ground: A 
}

The important property of an evar is that it is distinct from every other evar; the name attached to it is just a label. An evar is indexed by a phantom type indicating the underlying type of terms which may be bound to it.

A term is indexed by its underlying type. So Int becomes Term[Int], String becomes Term[String], and so on; an evar of type Evar[A] may only be bound to a term of type Term[A]. (Prolog is dynamically typed, but this statically-typed treatment of evars and terms fits better with Scala.)

The unify method unifies a term with another term of the same type, taking an environment and returning an updated environment (or None if the unification fails). Occurs checks if an evar occurs in a term (as we will see this is used to prevent circular bindings). Subst substitutes the variables in a term with their bindings in an environment, and ground returns the underlying Scala value represented by the term (provided the term contains no evars).

class Env(m: Map[Evar[Any],Term[Any]]) { 
  def apply[A](v: Evar[A]) = 
    m(v.asInstanceOf[Evar[Any]]).asInstanceOf[Term[A]] 
  def get[A](v: Evar[A]): Option[Term[A]] = 
    m.get(v.asInstanceOf[Evar[Any]]).asInstanceOf[Option[Term[A]]] 
  def updated[A](v: Evar[A], t: Term[A]): Env = { 
    val v2 = v.asInstanceOf[Evar[Any]] 
    val t2 = t.asInstanceOf[Term[Any]] 
    val e2 = Env(Map(v2 -> t2)) 
    val m2 = m.mapValues(_.subst(e2)) 
    Env(m2.updated(v2, t2)) 
  } 
} 
object Env { 
  def apply(m: Map[Evar[Any],Term[Any]]) = new Env(m) 
  def empty = new Env(HashMap()) 
}

An environment is just a map from evars to terms. Because we need to store evars and terms of different types in the same environment, we cast them to and from Any; this is safe because of the phantom type on Evar. For simplicity we maintain the invariant that the term bound to each evar is already substituted by the rest of the environment.

case class VarTerm[A](v: Evar[A]) extends Term[A] { 
  def unify(e: Env, t: Term[A]) = 
    t match { 
      case VarTerm(v2) if (v2 == v) => Some(e) 
      case _ => 
        if (t.occurs(v)) None 
        else Some(e.updated(v, t)) 
    } 
 
  def occurs[B](v2: Evar[B]) = v2 == v 
 
  def subst(e: Env) = 
    e.get(v) match { 
      case Some(t) => t 
      case None => this 
    } 
 
  def ground = 
    throw new IllegalArgumentException("not ground") 
 
  override def toString = { v.name  } 
}

The VarTerm class represents terms consisting of an evar. To unify a VarTerm with another VarTerm containing the same evar, we just return the environment unchanged (since there is no new information). Otherwise we check that the evar doesn’t appear in the term (since a unification x =:= List(x) would create a circular term) then return the updated environment.

To substitute a VarTerm we return the term bound to the evar in the environment if one exists, otherwise the unsubstituted VarTerm. A VarTerm is never ground (we assume ground is called only on terms which are already substituted by the environment).

case class LitTerm[A](a: A) extends Term[A] { 
  def unify(e: Env, t: Term[A]) = 
    t match { 
      case LitTerm(a2) => if (a == a2) Some(e) else None 
      case _: VarTerm[_] => t.unify(e, this) 
      case _ => None 
    } 
 
  def occurs[B](v: Evar[B]) = false 
  def subst(e: Env) = this 
  def ground = a 
 
  override def toString = { a.toString } 
}

LitTerm represents terms of literal Scala values. A LitTerm unifies with another LitTerm containing an equal value, but that adds nothing to the environment. Then we have two cases which we need for every term type—to unify with a VarTerm call unify back on it; otherwise fail.

case class NilTerm[A]() extends Term[List[A]] { 
  def unify(e: Env, t: Term[List[A]]) = 
    t match { 
      case NilTerm() => Some(e) 
      case _: VarTerm[_] => t.unify(e, this) 
      case _ => None 
    } 
 
  def occurs[B](v: Evar[B]) = false 
  def subst(e: Env) = this 
  def ground = Nil 
 
  override def toString = { Nil.toString } 
} 
 
case class ConsTerm[A](hd: Term[A], tl: Term[List[A]]) 
  extends Term[List[A]] 
{ 
  def unify(e: Env, t: Term[List[A]]) = 
    t match { 
      case ConsTerm(hd2, tl2) => 
        for { 
          e1 <- hd.unify(e, hd2) 
          e2 <- tl.subst(e1).unify(e1, tl2.subst(e1)) 
        } yield e2 
      case _: VarTerm[_] => t.unify(e, this) 
      case _ => None 
    } 
 
  def occurs[C](v: Evar[C]) = hd.occurs(v) || tl.occurs(v) 
  def subst(e: Env) = ConsTerm(hd.subst(e), tl.subst(e)) 
  def ground = hd.ground :: tl.ground 
 
  override def toString = { hd.toString + " :: " + tl.toString } 
}

NilTerm and ConsTerm represent the Nil and :: constructors for lists. Nil is sort of like a literal, so the methods for NilTerm are similar to those for LitTerm. For ConsTerm we unify by unifying the heads and tails, calling subst on the tails since unifying the heads may have added bindings to the environment. (Here it’s convenient to use a for-comprehension on the Option[Env] type since either unification may fail.) Similarly we implement occurs, subst, and ground by calling them on the head and tail.

object Term { 
  implicit def var2Term[A](v: Evar[A]): Term[A] = VarTerm(v) 
  //implicit def lit2term[A](a: A): Term[A] = LitTerm(a) 
  implicit def int2Term(a: Int): Term[Int] = LitTerm(a) 
  implicit def list2Term[A](l: List[Term[A]]): Term[List[A]] = 
    l match { 
      case Nil => NilTerm[A] 
      case hd :: tl => ConsTerm(hd, list2Term(tl)) 
    } 
}

Finally we have some implicit conversions to make it a little easier to build Term values. The lit2term conversion turned out to be a bad idea; in particular you don’t want a LitTerm[List[A]] since it doesn’t unify with a ConsTerm[A] or NilTerm[A].

State

In order to combine unification with backtracking, we need to keep track of the environment along each branch of the tree of choices. We don’t want the environments from different branches to interfere, so it’s convenient to use a purely functional environment representation; we pass the current environment down the tree as computation proceeds. However, we can hide this state passing in the monad interface:

trait LogicState { L => 
  type T[S,A] 
  // as before 
  def split[S,A](s: S, t: T[S,A]): Option[(S,A,T[S,A])] 
 
  def get[S]: T[S,S] 
  def set[S](s: S): T[S, Unit] 
 
  case class Syntax[S,A](t: T[S,A]) { 
    // as before 
    def &[B](t2: => T[S,B]): T[S,B] = L.bind(t, { _: A => t2 }) 
  } 
}

LogicState is mostly the same as Logic, except that the type of choices has an extra parameter for the type of the state. The get and set functions get and set the current state. To split we need an initial state to get things started, and each result includes an updated state. Finally we add the syntax & to sequence two computations, ignoring the value of the first. We’ll use this to sequence goals, since we care only about the updated environment.

The simplest implementation of LogicState builds on Logic:

trait LogicStateT extends LogicState { 
  val Logic: Logic 
 
  type T[S,A] = S => Logic.T[(S, A)]

We embed state-passing in a Logic.T as a function from an initial state to a choice of alternatives, where each alternative includes an updated state along with its value.

  def fail[S,A] = { s: S => Logic.fail } 
  def unit[S,A](a: A) = { s: S => Logic.unit((s, a)) } 
 
  def or[S,A](t1: T[S,A], t2: => T[S,A]) = 
    { s: S => Logic.or(t1(s), t2(s)) } 
 
  def bind[S,A,B](t: T[S,A], f: A => T[S,B]) = { 
    val f2: ((S,A)) => Logic.T[(S,B)] = { case (s, a) => f(a)(s) } 
    { s: S => Logic.bind(t(s), f2) } 
  } 
 
  def apply[S,A,B](t: T[S,A], f: A => B) = { 
    val f2: ((S,A)) => ((S,B)) = { case (s, a) => (s, f(a)) } 
    { s: S => Logic.apply(t(s), f2) } 
  } 
 
  def filter[S,A](t: T[S,A], p: A => Boolean) = { 
    val p2: ((S,A)) => Boolean = { case (_, a) => p(a) } 
    { s: S => Logic.filter(t(s), p2) } 
  }

All of these operations pass the state through unchanged. Note that or passes the same state to both alternatives—different branches of the tree cannot interfere with one another’s state.

  def split[S,A](s: S, t: T[S,A]) = { 
    Logic.split(t(s)) match { 
      case None => None 
      case Some(((s, a), t)) => Some((s, a, { _ => t })) 
    } 
  } 
 
  def get[S] = { s: S => Logic.unit((s,s)) } 
  def set[S](s: S) = { _: S => Logic.unit((s,())) } 
}

In split we pass the given state to the underlying Logic.T, and for each alternative we unpack the pair of state and value. The choice of remaining alternatives t encapsulates the current state, so when we return it we ignore the input state. In get and set we return and replace the current state.

Another approach is to pass state explicitly through LogicSFK:

object LogicStateSFK extends LogicState { 
  type FK[R] = () => R 
  type SK[S,A,R] = (S, A, FK[R]) => R 
 
  trait T[S,A] { def apply[R](s: S, sk: SK[S,A,R], fk: FK[R]): R }

This is not really any different from LogicStateT applied to LogicSFK—we have just uncurried the state argument. We can take the same path as last time and defunctionalize this into a tail-recursive implementation (see the full code) although LogicStateT applied to LogicSFKDefuncTailrec inherits tail-recursiveness from the underlying Logic monad.

Scrolog

Finally we can put the pieces together into a Prolog-like embedded DSL:

trait Scrolog { 
  val LogicState: LogicState 
  import LogicState._ 
 
  type G = T[Env,Unit]

From our point of view, a goal is a stateful choice among alternatives, where we don’t care about the value returned, only the environment.

  class TermSyntax[A](t: Term[A]) { 
    def =:=(t2: Term[A]): G = 
      for { 
        env <- get 
        env2 <- { 
          t.subst(env).unify(env, t2.subst(env)) match { 
            case None => fail[Env,Unit] 
            case Some(e) => set(e) 
          } 
        } 
      } yield env2 
  } 
 
  implicit def termSyntax[A](t: Term[A]) = new TermSyntax(t) 
  implicit def syntax[A](t: G) = LogicState.syntax(t)

We connect term unification to the stateful logic monad with a wrapper class defining a =:= operator. To unify terms in the monad, we get the current environment, substitute it into the two terms (to satisfy the invariant above), then call unify; if it fails we fail the computation, else we set the new state.

  def run[A](t: G, n: Int, tm: Term[A]): List[Term[A]] = 
    LogicState.run(Env.empty, t, n) 
      .map({ case (e, _) => tm.subst(e) }) 
}

The run function solves a goal, taking as arguments the goal, the maximum number of solutions to find, and a term to be evaluated in the environment of each solution.

Examples

First we need to set up Scrolog:

val Scrolog = 
  new Scrolog { val LogicState = 
    new LogicStateT { val Logic = LogicSFKDefuncTailrec } 
  } 
import Scrolog._

Here is a translation of the member predicate:

  def member[A](x: Term[A], l: Term[List[A]]): G = { 
    val hd = Evar[A]("hd"); val tl = Evar[List[A]]("tl") 
    ConsTerm(x, tl) =:= l | 
    (ConsTerm(hd, tl) =:= l & member(x, tl)) 
  }

We implement predicates by functions, and goals by function calls. To implement matching the clause head, we explicitly unify the input arguments against each clause head, and combine the clauses with |. Subgoals are sequenced with &. Finally, we must create local evars explicitly, since they are fresh for each call (just as local variables are in Scala).

Finally we can run the goal above:

scala> val x = Evar[Int]("x") 
scala> run(member(x, List[Term[Int]](1, 2, 3)), 3, x) 
res6: List[Term[Int]] = List(1, 2, 3)

As another example, we can implement addition over unary natural numbers. In Prolog this would be

  sum(z, N, N). 
  sum(s(M), N, s(P)) :- sum(M, N, P).

In Prolog we can just invent symbols like s and z; in Scala we need first to define a type of natural numbers, then terms over that type:

  sealed trait Nat 
  case object Z extends Nat 
  case class S(n: Nat) extends Nat 
 
  case object ZTerm extends Term[Nat] { 
    // like NilTerm 
 
  case class STerm(n: Term[Nat]) extends Term[Nat] { 
    // like ConsTerm

Then we can define sum, again separating the clauses by | and explicitly unifying the clause heads:

  def sum(m: Term[Nat], n: Term[Nat], p: Term[Nat]): G = { 
    val m2 = Evar[Nat]("m"); val p2 = Evar[Nat]("p") 
    (m =:= Z & n =:= p) | 
    (m =:= STerm(m2) & p =:= STerm(p2) & sum(m2, n, p2)) 
  }

We can use sum to do addition:

scala> val x = Evar[Nat]("x"); val y = Evar[Nat]("y") 
scala> run(sum(S(Z), S(S(Z)), x), 1, x) 
res8: List[Term[Nat]] = List(S(S(S(Z))))

or subtraction:

scala> run(sum(x, S(S(Z)), S(S(S(Z)))), 1, x) 
res10: List[Term[Nat]] = List(S(Z)) 
 
scala> run(sum(S(Z), x, S(S(S(Z)))), 1, x) 
res11: List[Term[Nat]] = List(S(S(Z)))

or even to find all the pairs of naturals which sum to 3:

scala> run(sum(x, y, S(S(S(Z)))), 10, List[Term[Nat]](x, y)) 
res14: List[Term[List[Nat]]] = 
  List(Z :: S(S(S(Z))) :: List(), 
       S(Z) :: S(S(Z)) :: List(), 
       S(S(Z)) :: S(Z) :: List(), 
       S(S(S(Z))) :: Z :: List())

although the printing of Term[List] could be better.

This is only a small taste of the expressivity of Prolog-style logic programming. Again let me recommend Frank Pfenning’s course notes, which explore the semantics of Prolog in a “definitional interpreters” style, by gradually refining an interpreter to expose more of the machinery of the language.

See the full code.

Logic programming in Scala, part 2: backtracking

2011-04-29T22:07:00.000-07:00

In the previous post we saw how to write computations in a logic monad, where a “value” is a choice among alternatives, and operating on a value means operating on all the alternatives.

Our first implementation of the logic monad represents a choice among alternatives as a list, and operating on a value means running the operation for each alternative immediately (to produce a new list of alternatives). If we imagine alternatives as leaves of a tree (with | indicating branching), the first implementation explores the tree breadth-first.

This is OK for some problems, but we run into trouble when there are a large or infinite number of alternatives. For example, a choice among the natural numbers:

scala> import LogicList._ 
import LogicList._ 
 
scala> val nat: T[Int] = unit(0) | nat.map(_ + 1) 
java.lang.NullPointerException 
        ...

This goes wrong because even though the right-hand argument to | is by-name, we immediately try to use it, and fail because nat is not yet defined.

scala> def nat: T[Int] = unit(0) | nat.map(_ + 1) 
nat: LogicList.T[Int] 
scala> run(nat, 10) 
java.lang.StackOverflowError 
        ...

With def we can successfully define nat, because the right-hand side isn’t evaluated until nat is used in the call to run, but we overflow the stack trying to compute all the natural numbers.

Let’s repair this with a fancier implementation of the logic monad, translated from Kiselyov et al.’s Backtracking, Interleaving, and Terminating Monad Transformers. This implementation will explore the tree depth-first.

Success and failure continuations

The idea is to represent a choice of alternatives by a function, which takes as arguments two functions: a success continuation and a failure continuation. The success continuation is just a function indicating what to do next with each alternative; the failure continuation is what to do next when there are no more alternatives.

For success, what we do next is either return the alternative (when we have reached a leaf of the tree), or perform some operation on it (possibly forming new branches rooted at the alternative). For failure, what we do next is back up to the last branch point and succeed with the next alternative. If there are no more alternatives at the previous branch point we back up again, and so on until we can succeed or finally run out of alternatives. In other words, we do depth-first search on the tree, except that the tree isn’t a materialized data structure—it’s created on the fly.

(In the jargon of logic programming, a branch point is called a “choice point”, and going back to an earlier choice point is called “backtracking”.)

object LogicSFK extends Logic { 
  type FK[R] = () => R 
  type SK[A,R] = (A, FK[R]) => R 
 
  trait T[A] { def apply[R](sk: SK[A,R], fk: FK[R]): R }

The continuations can return a result of some arbitrary type R. This means that the function representing a choice has a “rank-2” polymorphic type—it takes functions which are themselves polymorphic—which is not directly representable in Scala. But we can encode it by making the representation function a method on a trait.

The success continuation takes a value of the underlying type (i.e. an alternative), and also a failure continuation, to call in case this branch of the tree eventually fails (by calling fail, or filter when no alternative satisfies the predicate). The failure continuation is also called to succeed with the next alternative after returning a leaf (see split).

  def fail[A] = 
    new T[A] { 
      def apply[R](sk: SK[A,R], fk: FK[R]) = fk() 
    } 
 
  def unit[A](a: A) = 
    new T[A] { 
      def apply[R](sk: SK[A,R], fk: FK[R]) = sk(a, fk) 
    }

To fail, just call the failure continuation. To succeed with one alternative, call the success continuation with the single alternative and the passed-in failure continuation—there are no more alternatives to try, so if this branch fails the unit fails.

  def or[A](t1: T[A], t2: => T[A]) = 
    new T[A] { 
      def apply[R](sk: SK[A,R], fk: FK[R]) = 
        t1(sk, { () => t2(sk, fk) }) 
    }

Or creates a choice point. We want to explore the alternatives in both t1 and t2, so we pass the success continuation to t1 (which calls it on each alternative); when t1 is exhausted we pass the success continuation to t2; finally we fail with the caller’s failure continuation—that is, we backtrack.

  def bind[A,B](t: T[A], f: A => T[B]) = 
    new T[B] { 
      def apply[R](sk: SK[B,R], fk: FK[R]) = 
        t(({ (a, fk) => f(a)(sk, fk) }: SK[A,R]), fk) 
    } 
 
  def apply[A,B](t: T[A], f: A => B) = 
    new T[B] { 
      def apply[R](sk: SK[B,R], fk: FK[R]) = 
        t(({ (a, fk) => sk(f(a), fk) }: SK[A,R]), fk) 
    }

For bind we extend each branch by calling f on the current leaf. To succeed we call f on the alternative a. Now f(a) returns a choice of alternatives, so we pass it the original success continuation (which says what to do next with alternatives resulting from the bind), and the failure continuation in force at the point a was generated (which succeeds with the next available alternative from f(a)).

For apply things are simpler, since f(a) returns a single value rather than a choice of alternatives: we succeed immediately with the returned value.

  def filter[A](t: T[A], p: A => Boolean) = 
    new T[A] { 
      def apply[R](sk: SK[A,R], fk: FK[R]) = { 
        val sk2: SK[A,R] = 
          { (a, fk) => if (p(a)) sk(a, fk) else fk() } 
        t(sk2, fk) 
      } 
    }

To filter a choice of alternatives, each time we succeed with a value we see if it satisfies the predicate p; if it does, we succeed with that value (extending the branch), otherwise we fail (pruning the branch).

  def split[A](t: T[A]) = { 
    def unsplit(fk: FK[Option[(A,T[A])]]): T[A] = 
      fk() match { 
        case None => fail 
        case Some((a, t)) => or(unit(a), t) 
      } 
    def sk : SK[A,Option[(A,T[A])]] = 
      { (a, fk) => Some((a, bind(unit(fk), unsplit))) } 
    t(sk, { () => None }) 
  } 
}

The point of split is to pull a single alternative from a choice, returning along with it a choice of the remaining alternatives. In the list implementation we just returned the head and tail of the list. In this implementation, the alternatives are computed on demand; we want to be careful to do only as much computation as needed to pull the first alternative

The failure continuation we pass to t just returns None when there are no more alternatives. The success continuation sk returns the first alternative and a choice of the remaining alternatives (wrapped in Some).

The tricky part is the choice of remaining alternatives. We’re given the failure continuation fk; calling it calls sk on the next alternative, which ultimately returns Some(a, t) where a is the next alternative, or None if there are no more alternatives. We repackage this Option as a choice of alternatives with unsplit. So that we don’t call fk too soon, we call unsplit via bind, which defers the call until the resulting choice of alternatives is actually used.

Now we can write infinite choices:

scala> import LogicSFK._ 
import LogicSFK._ 
 
scala> val nat: T[Int] = unit(0) | nat.map(_ + 1) 
nat: LogicSFK.T[Int] = LogicSFK$$anon$3@27aea0c1 
 
scala> run(nat, 10) 
res1: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

Well, this is a pretty complicated way to generate the natural numbers up to 10…

While nat looks like a lazy stream (as you might write in Haskell), no results are memoized (as they are in Haskell). To compute each successive number all the previous ones must be recomputed, and the running time of run(nat, N) is O(N²).

Defunctionalization

The code above is a fairly direct translation of the Haskell code from the paper. But its use of continuation-passing style doesn’t map well to Scala, because Scala doesn’t implement tail-call elimination (because the JVM doesn’t). Every call to a success or failure continuation adds a frame to the stack, even though all we ever do with the result is return it (i.e. the call is in tail-position), so the stack frame could be eliminated.

Surprisingly, we run out of memory before we run out of stack:

scala> run(nat, 2000) 
java.lang.OutOfMemoryError: Java heap space 
 ...

A little heap profiling shows that we’re using quadratic space as well as quadratic time. It turns out that the implementation of Logic.run (from the previous post) has a space leak. The call to run is not tail-recursive, so the stack frame hangs around, and although t is dead after split(t), there’s still a reference to it on the stack.

We can rewrite run with an accumulator to be tail-recursive:

  def run[A](t: T[A], n: Int): List[A] = { 
    def runAcc(t: T[A], n: Int, acc: List[A]): List[A] = 
      if (n <= 0) acc.reverse else 
        split(t) match { 
          case None => acc.reverse 
          case Some((a, t)) => runAcc(t, n - 1, a :: acc) 
        } 
    runAcc(t, n, Nil) 
  }

Now scalac compiles runAcc as a loop, so there are no stack frames holding on to dead values of t, and we get the expected:

scala> run(nat, 9000) 
java.lang.StackOverflowError 
 ...

To address the stack overflow we turn to defunctionalization. The idea (from John Reynold’s classic paper Definitional Interpreters for Higher-Order Programming Languages) is to replace functions and their applications with data constructors (we’ll use case classes) and an apply function, which matches the data constructor and does whatever the corresponding function body does. If a function captures variables, the data constructor must capture the same variables.

After defunctionalization we’re left with three mutually recursive apply functions (one for each of T, FK, and SK) where each recursive call is in tail position. In theory the compiler could transform these into code that takes only constant stack space (since they are local functions private to split). But in fact it will do so only for single recursive functions, so we will need to do this transformation by hand.

There is one hitch: the original code is not completely tail-recursive, because of unsplit, which calls a failure continuation then matches on the result. To fix this we need to add yet another continuation, which represents what to do after returning a result from a success or failure continuation.

object LogicSFKDefunc extends Logic { 
  type O[A] = Option[(A,T[A])] 
 
  sealed trait T[A] 
  case class Fail[A]() extends T[A] 
  case class Unit[A](a: A) extends T[A] 
  case class Or[A](t1: T[A], t2: () => T[A]) extends T[A] 
  case class Bind[A,B](t: T[A], f: A => T[B]) extends T[B] 
  case class Apply[A,B](t: T[A], f: A => B) extends T[B] 
  case class Filter[A](t: T[A], p: A => Boolean) extends T[A] 
  case class Unsplit[A](fk: FK[O[A]]) extends T[A] 
 
  def fail[A] = Fail() 
  def unit[A](a: A) = Unit(a) 
  def or[A](t1: T[A], t2: => T[A]) = Or(t1, { () => t2 }) 
  def bind[A,B](t: T[A], f: A => T[B]) = Bind(t, f) 
  def apply[A,B](t: T[A], f: A => B) = Apply(t, f) 
  def filter[A](t: T[A], p: A => Boolean) = Filter(t, p)

A choice of alternatives T[A] is now represented symbolically by case classes, and the functions which operate on choices just return the corresponding case. The cases capture the same variables that were captured in the original functions.

We have an additional case Unsplit which represents the bind(unit(fk), unsplit) combination from split. And we use O[A] as a convenient abbreviation.

  sealed trait FK[R] 
  case class FKOr[A,R](t: () => T[A], sk: SK[A,R], fk: FK[R]) 
    extends FK[R] 
  case class FKSplit[R](r: R) extends FK[R] 
 
  sealed trait SK[A,R] 
  case class SKBind[A,B,R](f: A => T[B], sk: SK[B,R]) 
    extends SK[A,R] 
  case class SKApply[A,B,R](f: A => B, sk: SK[B,R]) 
    extends SK[A,R] 
  case class SKFilter[A,R](p: A => Boolean, sk: SK[A,R]) 
    extends SK[A,R] 
  case class SKSplit[A,R](r: (A, FK[R]) => R) extends SK[A,R] 
 
  sealed trait K[R,R2] 
  case class KReturn[R]() extends K[R,R] 
  case class KUnsplit[A,R,R2](sk: SK[A,R], fk:FK[R], k: K[R,R2]) 
    extends K[O[A],R2]

Each case for FK (respectively SK) corresponds to a success (respectively failure) continuation function in the original code—it’s easy to match them up.

The K cases are for the new return continuation. They are defunctionalized from functions R => R2; we can either return a value directly, or match on whether it is Some or None as in unsplit. (If K is hard to understand you might try “refunctionalizing” it by replacing the cases with functions.)

We see that case classes are more powerful than variants in OCaml, without GADTs at least. Cases can have “input” type variables (appearing in arguments) which do not appear in the “output” (the type the case extends). When we match on the case these are treated as existentials. And the output type of a case can be more restrictive than type it extends; when we match on the case we can make more restrictive assumptions about types in that branch of the match. More on this in Emir, Odersky, and Williams’ Matching Objects with Patterns.

  def split[A](t: T[A]) = { 
 
    def applyT[A,R,R2] 
      (t: T[A], sk: SK[A,R], fk: FK[R], k: K[R,R2]): R2 = 
      t match { 
        case Fail() => applyFK(fk, k) 
        case Unit(a) => applySK(sk, a, fk, k) 
        case Or(t1, t2) => applyT(t1, sk, FKOr(t2, sk, fk), k) 
        case Bind(t, f) => applyT(t, SKBind(f, sk), fk, k) 
        case Apply(t, f) => applyT(t, SKApply(f, sk), fk, k) 
        case Filter(t, p) => applyT(t, SKFilter(p, sk), fk, k) 
        case Unsplit(fk2) => applyFK(fk2, KUnsplit(sk, fk, k)) 
      } 
 
    def applyFK[R,R2](fk: FK[R], k: K[R,R2]): R2 = 
      fk match { 
        case FKOr(t, sk, fk) => applyT(t(), sk, fk, k) 
        case FKSplit(r) => applyK(k, r) 
      } 
 
    def applySK[A,R,R2] 
      (sk: SK[A,R], a: A, fk: FK[R], k: K[R,R2]): R2 = 
      sk match { 
        case SKBind(f, sk) => applyT(f(a), sk, fk, k) 
        case SKApply(f, sk) => applySK(sk, f(a), fk, k) 
        case SKFilter(p, sk) => 
          if (p(a)) applySK(sk, a, fk, k) else applyFK(fk, k) 
        case SKSplit(rf) => applyK(k, rf(a, fk)) 
      }

Again, each of these cases corresponds directly to a function in the original code, and again it is easy to match them up (modulo the extra return continuation argument) to see that all we have done is separated the data part of the function (i.e. the captured variables) from the code part.

The exception is Unsplit, which again corresponds to bind(unit(fk), unsplit). To apply it, we apply fk (which collapses unit(fk), bind, and the application of fk in unsplit) with KUnsplit as continuation, capturing sk, fk, and k (corresponding to their capture in the success continuation of bind).

    def applyK[R,R2](k: K[R,R2], r: R): R2 = 
      k match { 
        case KReturn() => r.asInstanceOf[R2] 
        case KUnsplit(sk, fk, k) => { 
          r match { 
            case None => applyFK(fk, k) 
            case Some((a, t)) => applyT(or(unit(a), t), sk, fk, k) 
          } 
        } 
      }

For KReturn we just return the result. Although KReturn extends K[R,R], Scala doesn’t deduce from this that R = R2, so we must coerce the result. For KUnsplit we do the same match as unsplit, then apply the resulting T (for the None case we call the failure continuation directly instead of applying fail). Here Scala deduces from the return type of KUnsplit that is safe to treat r as an Option.

    applyT[A,O[A],O[A]]( 
      t, 
      SKSplit((a, fk) => Some((a, Unsplit(fk)))), 
      FKSplit(None), 
      KReturn()) 
  } 
}

Finally we apply the input T in correspondence to the original split.

Tail call elimination

(This section has been revised; you can see the original here.)

To eliminate the stack frames from tail calls, we next rewrite the four mutually-recursive functions into a single recursive function (which Scala compiles as a loop). To do this we have to abandon some type safety (but only in the implementation of the Logic monad; we’ll still present the same safe interface).

object LogicSFKDefuncTailrec extends Logic { 
  type O[A] = Option[(A,T[A])] 
 
  type T[A] = I 
 
  sealed trait I 
  case class Fail() extends I 
  case class Unit(a: Any) extends I 
  case class Or(t1: I, t2: () => I) extends I 
  case class Bind(t: I, f: Any => I) extends I 
  case class Apply(t: I, f: Any => Any) extends I 
  case class Filter(t: I, p: Any => Boolean) extends I 
  case class Unsplit(fk: I) extends I 
 
  case class FKOr(t: () => I, sk: I, fk: I) extends I 
  case class FKSplit(r: O[Any]) extends I 
 
  case class SKBind(f: Any => I, sk: I) extends I 
  case class SKApply(f: Any => Any, sk: I) extends I 
  case class SKFilter(p: Any => Boolean, sk: I) extends I 
  case class SKSplit(r: (Any, I) => O[Any]) extends I 
 
  case object KReturn extends I 
  case class KUnsplit(sk: I, fk: I, k: I) extends I

This is all pretty much as before except that we erase all the type parameters. Having done so we can combine the four defunctionalized types into a single type I (for “instruction” perhaps), which will allow us to write a single recursive apply function. The type parameter in T[A] is then a phantom type since it does not appear on the right-hand side of the definition; it is used only to enforce constraints outside the module.

  def fail[A]: T[A] = Fail() 
  def unit[A](a: A): T[A] = Unit(a) 
  def or[A](t1: T[A], t2: => T[A]): T[A] = Or(t1, { () => t2 }) 
  def bind[A,B](t: T[A], f: A => T[B]): T[B] = 
    Bind(t, f.asInstanceOf[Any => I]) 
  def apply[A,B](t: T[A], f: A => B): T[B] = 
    Apply(t, f.asInstanceOf[Any => I]) 
  def filter[A](t: T[A], p: A => Boolean): T[A] = 
    Filter(t, p.asInstanceOf[Any => Boolean])

The functions for building T[A] values are mostly the same. We have to cast passed-in functions since Any is not a subtype of arbitrary A. The return type annotations don’t seem necessary but I saw some strange type errors without them (possibly related to the phantom type?) when using the Logic.Syntax wrapper.

def split[A](t: T[A]): O[A] = { 
  def apply(i: I, a: Any, r: O[Any], sk: I, fk: I, k: I): O[Any] = 
    i match { 
      case Fail() => apply(fk, null, null, null, null, k) 
      case Unit(a) => apply(sk, a, null, null, fk, k) 
      case Or(t1, t2) => 
        apply(t1, null, null, sk, FKOr(t2, sk, fk), k) 
      case Bind(t, f) => 
        apply(t, null, null, SKBind(f, sk), fk, k) 
      case Apply(t, f) => 
        apply(t, null, null, SKApply(f, sk), fk, k) 
      case Filter(t, p) => 
        apply(t, null, null, SKFilter(p, sk), fk, k) 
      case Unsplit(fk2) => 
        apply(fk2, null, null, null, null, KUnsplit(sk, fk, k)) 
 
      case FKOr(t, sk, fk) => apply(t(), null, null, sk, fk, k) 
      case FKSplit(r) => apply(k, null, r, null, null, null) 
 
      case SKBind(f, sk) => apply(f(a), null, null, sk, fk, k) 
      case SKApply(f, sk) => apply(sk, f(a), null, null, fk, k) 
      case SKFilter(p, sk) => 
        if (p(a)) 
          apply(sk, a, null, null, fk, k) 
        else 
          apply(fk, null, null, null, null, k) 
      case SKSplit(rf) => 
        apply(k, null, rf(a, fk), null, null, null) 
 
      case KReturn => r 
      case KUnsplit(sk, fk, k) => { 
        r match { 
          case None => apply(fk, null, null, null, null, k) 
          case Some((a, t)) => 
            apply(or(unit(a), t), null, null, sk, fk, k) 
        } 
      } 
    } 
 
  apply(t, 
        null, 
        null, 
        SKSplit((a, fk) => Some((a, Unsplit(fk)))), 
        FKSplit(None), 
        KReturn).asInstanceOf[O[A]] 
}

The original functions took varying arguments; the single function takes all the arguments which the original ones did. We pass null for unused arguments in each call, but otherwise the cases are the same as before.

Now we can evaluate nat to large N without running out of stack (but since the running time is quadratic it takes longer than I care to wait to complete):

scala> run(nat, 100000) 
^C

See the complete code here.

Next time we’ll thread state through this backtracking logic monad, and use it to implement unification.

Logic programming in Scala, part 1

2011-04-06T22:03:00.000-07:00

I got a new job where I am hacking some Scala. I thought I would learn something by translating some functional code into Scala, and a friend had recently pointed me to Kiselyov et al.’s Backtracking, Interleaving, and Terminating Monad Transformers, which provides a foundation for Prolog-style logic programming. Of course, a good translation should use the local idiom. So in this post (and the next) I want to explore an embedded domain-specific language for logic programming in Scala.

A search problem

Here is a problem I sometimes give in interviews:

Four people need to cross a rickety bridge, which can hold only two people at a time. It’s a moonless night, so they need a light to cross; they have one flashlight with a battery which lasts 60 minutes. Each person crosses the bridge at a different speed: Alice takes 5 minutes, Bob takes 10, Candace takes 20 minutes, and Dave 25. How do they get across?

I’m not interested in the answer—I’m interviewing programmers, not law school applicants—but rather in how to write a program to find the answer.

The basic shape of the solution is to represent the state of the world (where are the people, where is the flashlight, how much battery is left), write a function to compute from any particular state the set of possible next states, then search for an answer (a path from the start state to the final state) in the tree formed by applying the next state function transitively to the start state. (Here is a paper describing solutions in Prolog and Haskell.)

Here is a first solution in Scala:

object Bridge0 { 
  object Person extends Enumeration { 
    type Person = Value 
    val Alice, Bob, Candace, Dave = Value 
    val all = List(Alice, Bob, Candace, Dave) // values is broken 
  } 
  import Person._ 
 
  val times = Map(Alice -> 5, Bob -> 10, Candace -> 20, Dave -> 25) 
 
  case class State(left: List[Person], 
                   lightOnLeft: Boolean, 
                   timeRemaining: Int)

We define an enumeration of people (the Enumeration class is a bit broken in Scala 2.8.1), a map of the time each takes to cross, and a case class to store the state of the world: the list of people on the left side of the bridge (the right side is just the complement); whether the flashlight is on the left; and how much time remains in the flashlight.

  def chooseTwo(list: List[Person]): List[(Person,Person)] = { 
    val init: List[(Person, Person)] = Nil 
    list.foldLeft(init) { (pairs, p1) => 
      list.foldLeft(pairs) { (pairs, p2) => 
        if (p1 < p2) (p1, p2) :: pairs else pairs 
      } 
    } 
  }

This function returns the list of pairs of people from the input list. We use foldLeft to do a double loop over the input list, accumulating pairs (p1, p2) where p1 < p2; this avoids returning (Alice, Bob) and also (Bob, Alice). The use of foldLeft is rather OCamlish, and if you know Scala you will complain that foldLeft is not idiomatic—we will repair this shortly.

In Scala, Nil doesn’t have type 'a list like in OCaml and Haskell, but rather List[Nothing]. The way local type inference works, the type variable in the type of foldLeft is instantiated with the type of the init argument, so you have to ascribe a type to init (or explicitly instantiate the type variable with foldLeft[List[(Person, Person)]]) or else you get a type clash between List[Nothing] and List[(Person, Person)].

  def next(state: State): List[State] = { 
    if (state.lightOnLeft) { 
      val init: List[State] = Nil 
      chooseTwo(state.left).foldLeft(init) { 
        case (states, (p1, p2)) => 
          val timeRemaining = 
            state.timeRemaining - math.max(times(p1), times(p2)) 
          if (timeRemaining >= 0) { 
            val left = 
              state.left.filterNot { p => p == p1 || p == p2 } 
            State(left, false, timeRemaining) :: states 
          } 
          else 
            states 
      } 
    } else { 
      val right = Person.all.filterNot(state.left.contains) 
      val init: List[State] = Nil 
      right.foldLeft(init) { (states, p) => 
        val timeRemaining = state.timeRemaining - times(p) 
        if (timeRemaining >= 0) 
          State(p :: state.left, true, timeRemaining) :: states 
        else 
          states 
      } 
    } 
  }

Here we compute the set of successor states for a state. We make a heuristic simplification: when the flashlight is on the left (the side where everyone begins) we move two people from the left to the right; when it is on the right we move only one. I don’t have a proof that an answer must take this form, but I believe it, and it makes the code shorter.

So when the light is on the left we fold over all the pairs of people still on the left, compute the time remaining if they were to cross, and if it is not negative build a new state where they and the flashlight are moved to the right and the time remaining updated.

If the light is on the right we do the same in reverse, but choose only one person to move.

  def tree(path: List[State]): List[List[State]] = 
    next(path.head). 
      map(s => tree(s :: path)). 
        foldLeft(List(path)) { _ ++ _ } 
 
  def search: List[List[State]] = { 
    val start = List(State(Person.all, true, 60)) 
    tree(start).filter { _.head.left == Nil } 
  } 
}

A list of successive states is a path (with the starting state at the end and the most recent state at the beginning); the state tree is a set of paths. The tree rooted at a path is the set of paths with the input path as a suffix. To compute this tree, we find the successor states of the head of the path, augment the path with each state in turn, recursively find the tree rooted at each augmented path, then append them all (including the input path).

Then to find an answer, we generate the state tree rooted at the path consisting only of the start state (everybody and the flashlight on the left, 60 minutes remaining on the light), then filter out the paths which end in a final state (everybody on the right).

For-comprehensions

To make the code above more idiomatic Scala (and more readable), we would of course use for-comprehensions, for example:

  def chooseTwo(list: List[Person]): List[(Person,Person)] = 
    for { p1 <- list; p2 <- list; if p1 < p2 } yield (p1, p2)

Just as before, we do a double loop over the input list, returning pairs where p1 < p2. (However, under the hood the result list is constructed by appending to a ListBuffer rather than with ::, so the pairs are returned in the reverse order.)

The for-comprehension syntax isn’t specific to lists. It’s syntactic sugar which translates to method calls, so we can use it on any objects which implement the right methods. The methods we need are

  def filter(p: A => Boolean): T[A] 
  def map[B](f: A => B): T[B] 
  def flatMap[B](f: A => T[B]): T[B] 
  def withFilter(p: A => Boolean): T[A]

where T is some type constructor, like List. For List, filter and map have their ordinary meaning, and flatMap is a map (where the result type must be a list) which concatenates the resulting lists (that is, it flattens the list of lists).

WithFilter is like filter but should be implemented as a “virtual” filter for efficiency—for List it doesn’t build a new filtered list, but instead just keeps track of the filter function; this way multiple adjacent filters can be combined and the result produced with a single pass over the list.

The details of the translation are in the Scala reference manual, section 6.19. Roughly speaking, <- becomes flatMap, if becomes filter, and yield becomes map. So another way to write chooseTwo is:

  def chooseTwo(list: List[Person]): List[(Person,Person)] = 
    list.flatMap(p1 => 
      list.filter(p2 => p1 < p2).map(p2 => (p1, p2)))

The logic monad

So far we have taken a concrete view of the choices that arise in searching the state tree, by representing a choice among alternatives as a list. For example, in the chooseTwo function we returned a list of alternative pairs. I want now to take a more abstract view, and define an abstract type T[A] to represent a choice among alternatives of type A, along with operations on the type, packaged into a trait:

trait Logic { L => 
  type T[A] 
 
  def fail[A]: T[A] 
  def unit[A](a: A): T[A] 
  def or[A](t1: T[A], t2: => T[A]): T[A] 
  def apply[A,B](t: T[A], f: A => B): T[B] 
  def bind[A,B](t: T[A], f: A => T[B]): T[B] 
  def filter[A](t: T[A], p: A => Boolean): T[A] 
  def split[A](t: T[A]): Option[(A,T[A])]

A fail value is a choice among no alternatives. A unit(a) is a choice of a single alternative. The value or(t1, t2) is a choice among the alternatives represented by t1 together with the alternatives represented by t2.

The meaning of applying a function to a choice of alternatives is a choice among the results of applying the function to each alternative; that is, if t represents a choice among 1, 2, and 3, then apply(t, f) represents a choice among f(1), f(2), and f(3).

Bind is the same except the function returns a choice of alternatives, so we must combine all the alternatives in the result; that is, if t is a choice among 1, 3, and 5, and f is { x => or(unit(x), unit(x + 1)) }, then bind(t, f) is a choice among 1, 2, 3, 4, 5, and 6.

A filter of a choice of alternatives by a predicate is a choice among only the alternatives which pass the the predicate.

Finally, split is a function which returns the first alternative in a choice of alternatives (if there is at least one) along with a choice among the remaining alternatives.

  def or[A](as: List[A]): T[A] = 
    as.foldRight(fail[A])((a, t) => or(unit(a), t)) 
 
  def run[A](t: T[A], n: Int): List[A] = 
    if (n <= 0) Nil else 
      split(t) match { 
        case None => Nil 
        case Some((a, t)) => a :: run(t, n - 1) 
      }

As a convenience, or(as: List[A]) means a choice among the elements of as. And run returns a list of the first n alternatives in a choice, picking them off one by one with split; this is how we get answers out of a T[A].

  case class Syntax[A](t: T[A]) { 
    def map[B](f: A => B): T[B] = L.apply(t, f) 
    def filter(p: A => Boolean): T[A] = L.filter(t, p) 
    def flatMap[B](f: A => T[B]): T[B] = L.bind(t, f) 
    def withFilter(p: A => Boolean): T[A] = L.filter(t, p) 
 
    def |(t2: => T[A]): T[A] = L.or(t, t2) 
  } 
 
  implicit def syntax[A](t: T[A]) = Syntax(t) 
}

Here we hook into the for-comprehension notation, by wrapping values of type T[A] in an object with the methods we need (and | as an additional bit of syntactic sugar), which methods just delegate to the functions defined above. We arrange with an implicit conversion for these wrappers to spring into existence when we need them.

The bridge puzzle with the logic monad

Now we can rewrite the solution in terms of the Logic trait:

class Bridge(Logic: Logic) { 
  import Logic._

We pass an implementation of the logic monad in, then open it so the implicit conversion is available (we can also use T[A] and the Logic functions without qualification).

The Person, times, and State definitions are as before.

  private def chooseTwo(list: List[Person]): T[(Person,Person)] = 
    for { p1 <- or(list); p2 <- or(list); if p1 < p2 } 
    yield (p1, p2)

As we saw, we can write chooseTwo more straightforwardly using a for-comprehension. In the previous version we punned on list as a concrete list and as a choice among alternatives; here we convert one to the other explicitly.

  private def next(state: State): T[State] = { 
    if (state.lightOnLeft) { 
      for { 
        (p1, p2) <- chooseTwo(state.left) 
        timeRemaining = 
          state.timeRemaining - math.max(times(p1), times(p2)) 
        if timeRemaining >= 0 
      } yield { 
        val left = 
          state.left.filterNot { p => p == p1 || p == p2 } 
        State(left, false, timeRemaining) 
      } 
    } else { // ...

This is pretty much as before, except with for-comprehensions instead of foldLeft and explicit consing. (You can easily figure out the branch for the flashlight on the right.)

  private def tree(path: List[State]): T[List[State]] = 
    unit(path) | 
      (for { 
         state <- next(path.head) 
         path <- tree(state :: path) 
       } yield path) 
 
  def search(n: Int): List[List[State]] = { 
    val start = List(State(Person.all, true, 60)) 
    val t = 
      for { path <- tree(start); if path.head.left == Nil } 
      yield path 
    run(t, n) 
  } 
}

In tree we use | to adjoin the input path (previously we gave it in the initial value of foldLeft). In search we need to actually run the Logic.T[A] value rather than returning it, because it’s an abstract type and can’t escape the module (see the Postscript for an alternative); this is why the other methods must be private.

Implementing the logic monad with lists

We can recover the original solution by implementing Logic with lists:

object LogicList extends Logic { 
  type T[A] = List[A] 
 
  def fail[A] = Nil 
  def unit[A](a: A) = a :: Nil 
  def or[A](t1: List[A], t2: => List[A]) = t1 ::: t2 
  def apply[A,B](t: List[A], f: A => B) = t.map(f) 
  def bind[A,B](t: List[A], f: A => List[B]) = t.flatMap(f) 
  def filter[A](t: List[A], p: A => Boolean) = t.filter(p) 
  def split[A](t: List[A]) = 
    t match { 
      case Nil => None 
      case h :: t => Some(h, t) 
    } 
}

A choice among alternatives is just a List of the alternatives, so the semantics we sketched above are realized in a very direct way.

The downside to the List implementation is that we compute all the alternatives, even if we only care about one of them. (In the bridge problem any path to the final state is a satisfactory answer, but our program computes all such paths, even if we pass an argument to search requesting only one answer.) We might even want to solve problems with an infinite number of solutions.

Next time we’ll repair this downside by implementing the backtracking monad from the paper by Kiselyov et al.

See the complete code here.

Postscript: modules in Scala

I got the idea of implementing the for-comprehension methods as an implict wrapper from Edward Kmett’s functorial library. It’s nice that T[A] remains completely abstract, and the for-comprehension notation is just sugar. I also tried an implementation where T[A] is bounded by a trait containing the methods:

trait Monadic[T[_], A] { 
  def map[B](f: A => B): T[B] 
  def filter(p: A => Boolean): T[A] 
  def flatMap[B](f: A => T[B]): T[B] 
  def withFilter(p: A => Boolean): T[A] 
 
  def |(t: => T[A]): T[A] 
  def split: Option[(A,T[A])] 
} 
 
trait Logic { 
  type T[A] <: Monadic[T, A] 
  // no Syntax class needed

This works too but the type system hackery is a bit ugly, and it constrains implementations of Logic more than is necessary.

Another design choice is whether T[A] is an abstract type (as I have it) or a type parameter of Logic:

trait Logic[T[_]] { L => 
  // no abstract type T[A] but otherwise as before 
}

Neither alternative provides the expressivity of OCaml modules (but see addendum below): with abstract types, consumers of Logic cannot return values of T[A] (as we saw above); with a type parameter, they can, but the type is no longer abstract.

In OCaml we would write

module type Logic = 
sig 
  type 'a t 
 
  val unit : 'a -> 'a t 
  (* and so on *) 
end 
 
module Bridge(L : Logic) = 
struct 
  type state = ... 
  val search : state list L.t 
end

and get both the abstract type and the ability to return values of the type.

Addendum

Jorge Ortiz points out in the comments that it is possible to keep T[A] abstract and also return its values from Bridge, by making the Logic argument a (public) val. We can then remove the privates, and write search as just:

  def search: T[List[State]] = { 
    val start = List(State(Person.all, true, 60)) 
    for { path <- tree(start); if path.head.left == Nil } 
    yield path 
  }

instead of baking run into it. Now, if we write val b = new Bridge(LogicList) then b.search has type b.Logic.T[List[b.State]], and we can call b.Logic.run to evaluate it.

This is only a modest improvement; what’s still missing, compared to the OCaml version, is the fact that LogicList and b.Logic are the same module. So we can’t call LogicList.run(b.search) directly. Worse, we can’t compose modules which use the same Logic implementation, because they each have their own incompatibly-typed Logic member.

I thought there might be a way out of this using singleton types—the idea is that a match of a value v against a typed pattern where the type is w.type succeeds when v eq w (section 8.2 in the reference manual). So we can define

def run[A]( 
  Logic: Logic, 
  b: Bridge, 
  t: b.Logic.T[A], 
  n: Int): List[A] = 
{ 
  Logic match { 
    case l : b.Logic.type => l.run(t, n) 
  } 
}

which is accepted, but sadly

scala> run[List[b.State]](LogicList, b, b.search, 2) 
<console>:8: error: type mismatch; 
 found   : b.Logic.T[List[b.State]] 
 required: b.Logic.T[List[b.State]] 
       run[List[b.State]](LogicList, b, b.search, 2) 
                                          ^

Addendum addendum

Some further advice from Jorge Ortiz: the specific type of Logic (not just Logic.type) can be exposed outside Bridge either through polymorphism:

class Bridge[L <: Logic](val Logic: L) { 
  ... 
} 
 
val b = new Bridge(LogicList)

or by defining an abstract value (this works the same if Bridge is a trait):

abstract class Bridge { 
  val Logic: Logic 
  ... 
}

So we can compose uses of T but it remains abstract.

Three uses for a binary heap

2010-11-24T20:58:00.000-08:00

Lately I have been interviewing for jobs, so doing a lot of whiteboard programming, and binary heaps keep arising in the solutions to these interview problems. There is nothing new or remarkable about these applications (binary heaps and their uses are covered in any undergraduate algorithms class), but I thought I would write them down because they are cute, and in the hope that they might be useful to someone else who (like me) gets by most days as a working programmer with no algorithm fancier than quicksort or binary search.

Binary heaps

Here’s a signature for a binary heap module Heap:

module type OrderedType = 
sig 
  type t 
  val compare : t -> t -> int 
end 
 
module type S = sig 
  type elt 
  type t 
  val make : unit -> t 
  val add : t -> elt -> unit 
  val peek_min : t -> elt option 
  val take_min : t -> elt 
  val size : t -> int 
end 
 
module Make (O : OrderedType) : S with type elt = O.t

We start with a signature for ordered types (following the Set and Map modules in the standard library), so we can provide a type-specific comparison function.

From an ordered type we can make a heap which supports adding elements, peeking the smallest element (None if there are no elements) without removing it, removing and returning the smallest element (raising Not_found if the heap is empty), and returning the number of elements.

We’ll work out the asymptotic running times of the algorithms below, so it will be useful to know that the worst-case running time of the add and take_min functions is O(log n) where n is the number of elements in the heap.

Finding the k smallest elements in a list

Here’s a simple one. To find the smallest element in a list, we could sort the list then take the first element in the sorted list, at a cost of O(log n). Or we could just take a pass over the list keeping a running minimum, at a cost of O(n).

What if we want the k smallest elements? Again, we could sort the list, but if k < n we can do better by generalizing the single-pass solution. The idea is to keep the k smallest elements we’ve seen so far in a binary heap. For each element in the list we add it to the heap, then (if there were already k elements in the heap) remove the largest element in the heap, leaving the k smallest.

The running time is O(n log k) since we do an add and a take_min in a heap of size k for each of n elements in the list. Here’s the code:

let kmin (type s) k l = 
  let module OT = struct 
    type t = s 
    let compare e1 e2 = compare e2 e1 
  end in 
  let module H = Heap.Make(OT) in 
 
  let h = H.make () in 
  List.iter 
    (fun e -> 
       H.add h e; 
       if H.size h > k 
       then ignore (H.take_min h)) 
    l; 
  let rec loop mins = 
    match H.peek_min h with 
      | None -> mins 
      | _ -> loop (H.take_min h :: mins) in 
  loop []

Here we make good use of OCaml 3.12’s new feature for explicitly naming type variables in a polymorphic function to make a structure matching OrderedType. The heap has the same element type as the list, but we reverse the comparison since we want to remove the largest rather than smallest element from the heap in the loop. At the end of kmin we drain the heap to build a list of the k smallest elements.

Merging k lists

Suppose we want to merge k lists. We could merge them pairwise until there is only one list, but that would take k - 1 passes, for a worst-case running time of O(n * (k - 1)). Instead we can merge them all in one pass, using a binary heap so we can find the next smallest element of k lists in O(log k) time, for a running time of O(n log k). Here’s the code:

let merge (type s) ls = 
  let module OT = struct 
    type t = s list 
    let compare e1 e2 = 
      compare (List.hd e1) (List.hd e2) 
  end in 
  let module H = Heap.Make(OT) in 
 
  let h = H.make () in 
  let add = function 
    | [] -> () 
    | l -> H.add h l in 
  List.iter add ls; 
  let rec loop () = 
    match H.peek_min h with 
      | None -> [] 
      | _ -> 
          match H.take_min h with 
            | [] -> assert false 
            | m :: t -> 
                add t; 
                m :: loop () in 
  loop ()

We store the lists in the heap, and compare them by comparing their head element (we’re careful not to put an empty list in the heap). When we take the smallest list from the heap, its head becomes the next element in the output list, and we return its tail (if it is not empty) to the heap.

Computing a skyline

The next problem was told to me in terms of computing the skyline of a set of buildings. A building has a height and a starting and ending x-coordinate; buildings may overlap. The skyline of a set of buildings is a list of (x, y) pairs (in ascending x order), describing a sequence of horizontal line segments (each starting at (x, y) and ending at the subsequent x), such that at any x there is no space between the line segment and the tallest building. (Here’s another description with diagrams.)

I googled a bit to see what this is useful for, and didn’t find much. One application is to extract a monophonic line from polyphonic music, where x is time and height is some metric on notes, like pitch or volume. It might be useful for searching data which is only intermittently applicable—say, to compute a schedule over time of the nearest open restaurant.

The algorithm scans the building start and end points in ascending x order, keeping the “active” buildings (those which overlap the current x) in a binary heap. The height of the skyline can only change at a building start or end point. We can determine the tallest building at a point by calling peek_min on the heap.

When we hit a start point we add the building to the heap; for an end point we do nothing (the heap has no operation to remove an element). So we may have inactive buildings in the heap. We remove them lazily—before checking the height of the highest building, we call take_min to remove any higher inactive buildings.

The worst-case running time is O(n log n), since we do some heap operations for each building, and we might end up with all the buildings in the heap.

Here’s the code:

type building = int * int * int (* x0, x1, h *) 
 
let skyline bs = 
  let module OT = struct 
    type t = int * building 
    let compare (x1, _) (x2, _) = compare x1 x2 
  end in 
  let module Events = Heap.Make(OT) in 
  let events = Events.make () in 
  List.iter 
    (fun ((x0,x1,_) as b) -> 
       Events.add events (x0, b); 
       Events.add events (x1, b)) 
    bs; 
 
  let module OT = struct 
    type t = building 
    let compare (_,_,h1) (_,_,h2) = compare h2 h1 
  end in 
  let module Heights = Heap.Make(OT) in 
  let heights = Heights.make () in 
 
  let rec loop last = 
    match Events.peek_min events with 
      | None -> [] 
      | _ -> 
          let (x, (x0,_,h as b)) = Events.take_min events in 
          if x = x0 then Heights.add heights b; 
          while (match Heights.peek_min heights with 
                   | Some (_,x1,_) -> x1 <= x 
                   | _ -> false) do 
            ignore (Heights.take_min heights) 
          done; 
          let h = 
            match Heights.peek_min heights with 
              | Some (_,_,h) -> h 
              | None -> 0 in 
          match last with 
            | Some h' when h = h' -> loop last 
            | _ -> (x, h) :: loop (Some h) in 
  loop None

We use a second heap events to store the “events” (the start and end points of all the buildings), in order to process them in ascending x order. (This use is not dynamic—we do not add new elements to the heap while processing them—so we could just as well use another means of sorting the points.) In this heap we store the x coordinate and the building (we can tell whether we have a start or end point by comparing the x coordinate to the building’s start point), and compare elements by comparing just the x coordinates.

The main heap heights stores buildings, and we compare them by comparing heights (reversed, so peek_min peeks the tallest building). While there are still events, we add the building to heights if the event is a start point, clear out inactive buildings, then return the pair (x, y) where x is the point we’re processing and y is the height of the tallest active building. Additionally we filter out adjacent pairs with the same height; these can arise when a shorter building starts or ends while a taller building is active.

Implementing binary heaps

The following implementation is derived from the one in Daniel Bünzli’s React library (edited a little bit for readability). The Wikipedia article on binary heaps explains the standard technique well, so I won’t repeat it.

The only piece of trickiness is the use of Obj.magic 0 for unused elements of the array, so we can grow it by doubling the size rather than adding a single element each time, and thereby amortize the cost of blitting the old array.

module Make (O : OrderedType) : S with type elt = O.t = 
struct 
  type elt = O.t 
  type t = { mutable arr : elt array; mutable len : int } 
 
  let make () = { arr = [||]; len = 0; } 
 
  let compare h i1 i2 = O.compare h.arr.(i1) h.arr.(i2) 
 
  let swap h i1 i2 = 
    let t = h.arr.(i1) in 
    h.arr.(i1) <- h.arr.(i2); 
    h.arr.(i2) <- t 
 
  let rec up h i = 
    if i = 0 then () 
    else 
      let p = (i - 1) / 2 in 
      if compare h i p < 0 then begin 
        swap h i p; 
        up h p 
      end 
 
  let rec down h i = 
    let l = 2 * i + 1 in 
    let r = 2 * i + 2 in 
    if l >= h.len then () 
    else 
      let child = 
        if r >= h.len then l 
        else if compare h l r < 0 then l else r in 
      if compare h i child > 0 then begin 
        swap h i child; 
        down h child 
      end 
 
  let add h e = 
    if h.len = Array.length h.arr 
    then begin 
      let len = 2 * h.len + 1 in 
      let arr' = Array.make len (Obj.magic 0) in 
      Array.blit h.arr 0 arr' 0 h.len; 
      h.arr <- arr' 
    end; 
    h.arr.(h.len) <- e; 
    up h h.len; 
    h.len <- h.len + 1 
 
  let peek_min h = 
    match h.len with 
      | 0 -> None 
      | _ -> Some h.arr.(0) 
 
  let take_min h = 
    match h.len with 
      | 0 -> raise Not_found 
      | 1 -> 
          let m = h.arr.(0) in 
          h.arr.(0) <- (Obj.magic 0); 
          h.len <- 0; 
          m 
      | k -> 
          let m = h.arr.(0) in 
          let k = k - 1 in 
          h.arr.(0) <- h.arr.(k); 
          h.arr.(k) <- (Obj.magic 0); 
          h.len <- k; 
          down h 0; 
          m 
 
  let size h = h.len 
end

(Complete code is here.)

Reading Camlp4, part 11: syntax extensions

2010-09-10T17:16:00.000-07:00

In this final (?) post in my series on Camlp4, I want at last to cover syntax extensions. A nontrivial syntax extension involves almost all the topics we have previously covered, so it seems fitting that we treat them last.

Extending grammars

In the post on parsing we covered Camlp4 grammars but stopped short of explaining how to extend them. Well, this is not completely true: we used the EXTEND form to extend an empty grammar, and we can also use it to extend non-empty grammars. We saw a small example of this when implementing quotations, where we extended the JSON grammar with a new json_eoi entry (which refered to an entry in the original grammar). Rules and levels may also be added to existing entries, and rules may be deleted.

Let’s look at a complete syntax extension, which demonstrates modifying Camlp4’s OCaml grammar. The purpose of the extension is to change the precedence of the method call operator # to make “method chaining” read better. For example, if the foo method returns an object, you can write

  obj#foo "bar" #baz

to call the baz method, rather than needing

  (obj#foo "bar")#baz

(I originally wrote this for use with the jQuery binding for ocamljs; method chaining is common with jQuery.)

Here is the extension:

  open Camlp4 
  
  module Id : Sig.Id = 
  struct 
    let name = "pa_jquery" 
    let version = "0.1" 
  end 
  
  module Make (Syntax : Sig.Camlp4Syntax) = 
  struct 
    open Sig 
    include Syntax 
  
    DELETE_RULE Gram expr: SELF; "#"; label END; 
  
    EXTEND Gram 
      expr: BEFORE "apply" 
        [ "#" LEFTA 
          [ e = SELF; "#"; lab = label -> 
              <:expr< $e$ # $lab$ >> ] 
        ]; 
    END 
  end 
  
  module M = Register.OCamlSyntaxExtension(Id)(Make)

To make sense of a syntax extension it’s helpful to refer to Camlp4OCamlRevisedParser.ml (which defines the revised syntax grammar) and Camlp4OCamlParser.ml (which defines the original syntax as an extension of the revised syntax). There we see that the # operator is parsed in the expr entry, in a level called ”.” (which includes other dereferencing operators), and that this level appears below the apply level, which parses function application. Recall from the parsing post that operators in lower levels bind more tightly. So to get the effect we want, we need to move the # rule above the apply level in the grammar.

First we delete the rule from its original location: DELETE_RULE takes the grammar, the entry, and the symbols on the left-hand side of the rule, followed by END; you don’t have to say in what level it appears. Then we add the rule at a new location: we create a new level # containing the rule from the original grammar, and add it before the level named apply.

There are several ways to specify where a level is inserted: BEFORE level and AFTER level put it before or after some other level; LEVEL level adds rules to an existing level (you will be warned but not stopped from changing the label or associativity of the level); FIRST and LAST put the level before or after all other levels. If you don’t specify, rules are added to the topmost level in the entry. The resulting grammar works just as if you had given it all at once, making the insertions in the specified places. (However, it is not very clear from the code how ordering works when inserting rules into an existing level; it is perhaps best not to rely on the order of rules in a level anyway.)

Finally we register the extension. The Make argument to OCamlSyntaxExtension returns a Sig.Camlp4Syntax for some reason (in Register.ml it is just ignored) so we include Syntax to provide it.

(The complete code for this example is here.)

Transforming the AST

Let’s do a slightly more complicated example involving some transformation of the parsed AST. It often comes up that we want to let-bind the value of an expression to a name, trapping exceptions, then evaluate the body of the let outside the scope of the exception handler. This is a bit painful to write in stock OCaml; we can only straightforwardly express trapping exceptions in the whole let expression:

  try let x = e1 in e2 
  with e -> h

A nice alternative is to use thunks to delay the evaluation of the body, doing it outside the scope of the try/with:

  (try let x = e1 in fun () -> e2 
   with e -> fun () -> h)()

(We must thunkify the exception handler to make the types work out.) This is simple enough to do by hand, but let’s give it some syntactic sugar:

  let try x = e1 in e2 
  with e -> h

which should expand to the thunkified version above. (The idea and syntax are taken from Martin Jambon’s micmatch extension.)

Let’s look at the existing rules in Camlp4OCamlRevisedParser.ml for let and try to get an idea of how to parse the let/try form:

  [ "let"; r = opt_rec; bi = binding; "in"; x = SELF -> 
      <:expr< let $rec:r$ $bi$ in $x$ >> 
  ... 
  | "try"; e = sequence; "with"; a = match_case -> 
      <:expr< try $mksequence' _loc e$ with [ $a$ ] >>

For let, the opt_rec entry parses an optional rec keyword (we see there is a special antiquotation for interpolating rec). Binding parses a group of bindings separated by and. SELF is just expr. For try, sequence is a sequence of expressions separated by ;, and match_case is a group of match cases separated by |. (These entries are both a little different in the original syntax, to account for the different semicolon rules and the [] delimiters around the match cases.) Recall that Camlp4OCamlRevisedParser.ml uses the revised syntax quotations, so we have [] around the match cases. The call to mksequence' just wraps a do {} around a sequence if necessary; more on this below.

The parsing rule we want is a combination of these. Here is the extension:

  EXTEND Gram 
    expr: LEVEL "top" [ 
      [ "let"; "try"; r = opt_rec; bi = binding; "in"; 
        e = sequence; "with"; a = match_case -> 
          let a = 
            List.map 
              (function 
                 | <:match_case< $p$ when $w$ -> $e$ >> -> 
                     <:match_case< 
                       $p$ when $w$ -> fun () -> $e$ 
                     >> 
                 | mc -> mc) 
              (Ast.list_of_match_case a []) in 
          <:expr< 
            (try let $rec:r$ $bi$ in fun () -> do { $e$ } 
             with [ $list:a$ ])() 
          >> 
      ] 
    ]; 
  END

We put rec after try (following micmatch), which is a little weird ~~, but if we put it before we would need to look ahead to disambiguate `let` from `let try`; once we parse `opt_rec` we are committed to one rule or the other~~ ; instead we could start the rule "let"; r = opt_rec; "try", which has no ambiguity with the ordinary let rule because the "let"; opt_rec prefix is factored out; the parser doesn’t choose between the rules until it tries to parse try. After in we parse sequence rather than SELF; this seems like a good choice because there is a with to end the sequence.

Now, to transform the AST, we map over the match cases. The match_case entry returns a list of cases separated by Ast.McOr; we call list_of_match_case to get an ordinary list. For each case, we match the pattern, when clause, and expression on the right-hand side (these are packaged in an Ast.McArr, where the when clause field is Ast.ExNil if there is no when clause), and return it with the expression thunkified. Then we return the whole let inside try, with the body sequence thunkified.

We have to add a do {} around the body, creating an Ast.ExSeq node, because that’s what is expected by Camlp4Ast2OCamlAst.ml—recall from the filters post that the Camlp4 AST is translated to an OCaml AST and marshalled to the compiler. If we forget this (and “we” often forget these idiosyncrasies) then we get the error ”expr; expr: not allowed here, use do {...} or [|...|] to surround them”, which is pretty helpful as these errors go.

(The complete code for this example is here.)

Extending pattern matching

As a final example, let’s extend OCaml’s pattern syntax. In the quotations post we noted that JSON quotations in a pattern are not very useful, because we would usually like a pattern to match even if the fields of an object come in a different order or there are extra fields. To keep the code short let’s abstract the problem a little and consider matching association lists: if we write a match case

  | alist [ "foo", x; "bar", y ] -> e

we would like it to match association lists with "foo" and "bar" keys, in any order, with any extra pairs in the list. Our translation looks like this:

  | __pa_alist_patt_1 when 
      (match ((try Some (List.assoc "foo" __pa_alist_patt_1) 
               with | Not_found -> None), 
              (try Some (List.assoc "bar" __pa_alist_patt_1) 
               with | Not_found -> None)) 
       with 
       | (Some x, Some y) -> true 
       | _ -> false) 
      -> 
      (match ((try Some (List.assoc "foo" __pa_alist_patt_1) 
               with | Not_found -> None), 
              (try Some (List.assoc "bar" __pa_alist_patt_1) 
               with | Not_found -> None)) 
       with 
       | (Some x, Some y) -> e 
       | _ -> assert false)

This might seem overcomplicated, and it is true that we could simplify it for this case. But the built-in pattern syntax is complicated, and it is tricky handling all the cases to make things work smoothly; the strategy that produces the code above will handle some (but not all) of the complications. (We’ll consider some improvements below.)

The basic idea is that when we come to an alist we replace it with a new fresh name, then do further matching in a when clause, so if it fails we can continue to the next case by returning false. In the when clause we look up the keys, putting them in options, then match on the options; we handle nested patterns (to the right of a key) by embedding them in the when clause match. The when clause match also binds variables appearing in the original pattern, so they are available to the when clause of the original case (if there is one). Finally, we do the whole thing over again in the match case body to provide bindings to the original body.

In order to implement this we’ll use both a syntax extension and a filter. The reason is that we’d like to extend the patt entry, but to do the AST transformation we sketched above we need to transform match_cases. We could replace the match_case part of the parser as well but that seems needlessly hairy, and generally when writing a syntax extension we’d like to touch as little of the parser as possible so it interoperates well with other extensions.

First, here is the syntax extension:

  EXTEND Gram 
    patt: LEVEL "simple" 
    [[ 
       "alist"; "["; 
         l = 
           LIST0 
             [ e = expr LEVEL "simple"; ","; 
               p = patt LEVEL "simple" -> 
                 Ast.PaOlbi (_loc, "", p, e) ] 
             SEP ";"; 
       "]" -> 
         <:patt< $uid:"alist"$ $Ast.paSem_of_list l$ >> 
    ]]; 
  END

We extend the simple level of the patt entry, which parses primitive patterns. Inside alist [] we parse a list of expr / patt pairs; we parse expr at the simple level or else it would parse the whole pair as an expr, and the same for patt just in case. Then we return the pair of expression and pattern in an Ast.PaOlbi (ordinarily used for optional argument defaults in function definitions). Why? Well, we need to return something of type patt, but we need somehow to get the expr to our filter, and this is the only patt constructor that holds an expr. (As an alternative we could parse a patt instead of an expr, but then we’d need to translate it to an expr at the point we use it.) Finally we return the list wrapped in a data constructor so we can recognize it easily in the filter; because it is lower-case, we can be sure that “alist” is not the identifier of a real data constructor.

Now let’s look at the filter. First, some helper functions:

  let fresh = 
    let id = ref 0 in 
    fun () -> 
      incr id; 
      "__pa_alist_patt_"  ^ string_of_int !id 
 
  let expr_tup_of_list _loc = function 
    | [] -> <:expr< () >> 
    | [ v ] -> v 
    | vs -> <:expr< $tup:Ast.exCom_of_list vs$ >> 
 
  let patt_tup_of_list _loc = function 
    | [] -> <:patt< () >> 
    | [ p ] -> p 
    | ps -> <:patt< $tup:Ast.paCom_of_list ps$ >>

We have a function to generate fresh names, a function to turn a list of expressions into a tuple, and a similar function for patterns. The reason we need these latter two is that a tuple with 0 or 1 elements is not allowed by Camlp4Ast2OCamlAst.ml (the empty “tuple” is actually a special identifier in the Camlp4 AST). Next, the main rewrite function:

  let rewrite _loc p w e = 
    let k = ref (fun s f -> s) in

The function takes the parts of an Ast.McArr (that is, a match case). We’re going to map over the pattern p, building up a function k as we encounter nested alist forms. We want to generate the same matching code in the when clause and the body, so k is parameterized with an expression in case of success (the original when clause or the body) and in case of failure (false or assert false). We will build k from the inside out, starting with a function that just returns the success expression.

    let map = 
      object 
        inherit Ast.map as super 
 
        method patt p = 
          match super#patt p with 
            | <:patt< $uid:"alist"$ $l$ >> -> 
                let id = fresh () in 
                let l = 
                  List.map 
                    (function 
                       | Ast.PaOlbi (_, _, p, e) -> p, e 
                       | _ -> assert false) 
                    (Ast.list_of_patt l []) in 
                let vs = 
                  List.map 
                    (fun (_, e) -> 
                       <:expr< 
                         try Some (List.assoc $e$ $lid:id$) 
                         with Not_found -> None 
                       >>) 
                    l in 
                let ps = 
                  List.map 
                    (fun (p, _) -> <:patt< Some $p$ >>) 
                    l in 
                let k' = !k in 
                k := 
                  (fun s f -> 
                     <:expr< 
                       match $expr_tup_of_list _loc vs$ with 
                         | $patt_tup_of_list _loc ps$ -> $k' s f$ 
                         | _ -> $f$ 
                     >>); 
                <:patt< $lid:id$ >> 
            | p -> p 
      end in

The Ast.map object provides methods to map each syntactic class of the AST, along with default implementations which return the node unchanged. We extend it to walk over the pattern, leaving it unchanged except when we come to our special alist constructor. In that case we generate a fresh name, which we return as the value of the function. Then we extract the expr / patt pairs and map them to try Some (List.assoc ... expressions and Some patterns. Finally we extend k by matching all the expressions against all the patterns; if the match succeeds we call the previous k, otherwise the failure expression. Since we build k from the inside out, we transform subpatterns first (by matching over super#patt p).

    let p = map#patt p in 
    let w = match w with <:expr< >> -> <:expr< true >> | _ -> w in 
    let w = !k w <:expr< false >> in 
    let e = !k e <:expr< assert false >> in 
    <:match_case< $p$ when $w$ -> $e$ >>

We call map#patt on p to replace special alist constructor nodes with fresh names and build up k, then call the resulting k on the when clause (if there is no when clause we replace it with true) and body, and finally return the result as a match_case, completing the rewrite function.

  let filter = 
    let map = 
      object 
        inherit Ast.map as super 
 
        method match_case mc = 
          match super#match_case mc with 
            | <:match_case@_loc< $p$ when $w$ -> $e$ >> -> 
                rewrite _loc p w e 
            | e -> e 
      end in 
    map#str_item 
 
  let _ = AstFilters.register_str_item_filter filter

We extend Ast.map again to call the rewrite function on each match_case, then register the resulting filter.

The code above handles when clauses and nested alist patterns, and interacts properly with ordinary OCaml patterns. However, it completely falls down on nested pattern alternatives. If we write

match x with 
  | alist [ "foo", f ] 
  | alist [ "fooz", f ] -> e

we get this mess:

  | __pa_alist_patt_1 | __pa_alist_patt_2 when 
      (match try Some (List.assoc "fooz" __pa_alist_patt_2) 
             with | Not_found -> None 
       with 
       | Some f -> 
           (match try Some (List.assoc "foo" __pa_alist_patt_1) 
                  with | Not_found -> None 
            with 
            | Some f -> true 
            | _ -> false) 
       | _ -> false) 
      -> 
      ... (* the same mess for the body *)

The first problem is that both arms of an alternative must bind the same names, but we have replaced them with two different fresh names. The second problem is that we have blindly treated alternative alist patterns as being nested one inside the other. A solution to both these problems is to split nested alternatives into separate cases, at the cost of duplicating the when clause and body in each.

Jeremy Yallop’s patterns framework (see here for an update that works with OCaml 3.12.0) allows multiple pattern extensions to coexist, and provides some common facilities to make them easier to write. In particular it splits nested alternatives into separate cases. Another deficiency in the code above is that it duplicates the match expression (built in k) in the when clause and body. This can be avoided by computing the body within the when clause, setting a reference, and dereferencing it in the body. However, the reference must be bound outside the match_case to be visible both in the when clause and the body, so this approach must transform each AST node that contains match_cases in order to bind the refs in the right place. The patterns framework handles this as well.

(The complete code for this example is here. A version using the patterns framework is here.)

ocamljs 0.3

2010-08-26T14:45:00.001-07:00

I am happy to announce version 0.3 of ocamljs. Ocamljs is a system for compiling OCaml to Javascript. It includes a Javascript back-end for the OCaml compiler, as well as several support libraries, such as bindings to the browser DOM. Ocamljs also works with orpc for RPC over HTTP, and froc for functional reactive browser programming.

Changes since version 0.2 include:

support for OCaml 3.11.x and 3.12.0
jQuery binding (contributed by Dave Benjamin)
full support for OCaml objects (interoperable with Javascript objects)
Lwt 2.x support
ocamllex and ocamlyacc support
better interoperability with Javascript
many small fixes and improvements

Development of ocamljs has moved from Google Code to Github; see

project page: http://github.com/jaked/ocamljs
documentation: http://jaked.github.com/ocamljs
downloads: http://github.com/jaked/ocamljs/downloads

Comparison to js_of_ocaml

Since I last did an ocamljs release, a new OCaml-to-Javascript system has arrived, js_of_ocaml. I want to say a little about how the two systems compare:

Ocamljs is a back-end to the existing OCaml compiler; it translates the “lambda” intermediate language to Javascript. (This is also where the bytecode and native code back-ends connect to the common front-end.) Js_of_ocaml post-processes ordinary OCaml bytecode (compiled and linked with the ordinary OCaml bytecode compiler) into Javascript. With ocamljs you need a special installation of the compiler (and special support for ocamlbuild and ocamlfind), you need to recompile libraries, and you need the OCaml source to build it. With js_of_ocaml you don’t need any of this.

Since ocamljs recompiles libraries, it’s possible to special-case code for the Javascript build to take advantage of Javascript facilities. For example, ocamljs implements the Buffer module on top of Javascript arrays instead of strings, for better performance. Similarly, it implements CamlinternalOO to use Javascript method dispatch directly instead of layering OCaml method dispatch on top. Js_of_ocaml can’t do this (or at least it would be necessary to recognize the compiled bytecode and replace it with the special case).

Because js_of_ocaml works from bytecode, it can’t always know the type of values (at the bytecode level, ints, bools, and chars all have the same representation, for example). This makes interoperating with native Javascript more difficult: you usually need conversion functions between the OCaml and Javascript representation of values when you call a Javascript function from OCaml. Ocamljs has more information to work with, and can represent OCaml bools as Javascript bools, for example, so you can usually call a Javascript function from OCaml without conversions.

Ocamljs has a mixed representation of strings: literal strings and the result of ^, Buffer.contents, and Printf.sprintf are all immutable Javascript strings; strings created with String.create are mutable strings implemented by Javascript arrays (with a toString method which returns the represented string). This is good for interoperability—you can usually pass a string directly to Javascript—but it doesn’t match regular OCaml’s semantics, and it can cause runtime failures (e.g. if you try to mutate an immutable string). Js_of_ocaml implements only mutable strings, so you need conversions when calling Javascript, but the semantics match regular OCaml.

With ocamljs, Javascript objects can be called from OCaml using the ordinary OCaml method-call syntax, and objects written in OCaml can be called using the ordinary Javascript syntax. With js_of_ocaml, a special syntax is needed to call Javascript objects, and OCaml objects can’t easily be called from Javascript. However, there is an advantage to having a special call syntax: with ocamljs it is not possible to partially apply calls to native Javascript methods, but this is not caught by the compiler, so there can be a runtime failure.

Ocamljs supports inline Javascript, while js_of_ocaml does not. I think it might be possible for js_of_ocaml to do so using the same approach that ocamljs takes: use Camlp4 quotations to embed a syntax tree, then convert the syntax tree from its OCaml representation (as lambda code or bytecode) into Javascript. However, you would still need conversion functions between OCaml and Javascript values.

I haven’t compared the performance of the two systems. It seems like there must be a speed penalty to translating from bytecode compared to translating from lambda code. On the other hand, while ocamljs is very naive in its translation, js_of_ocaml makes several optimization passes. With many programs it doesn’t matter, since most of the time is spent in browser code. (For example, the planet example seems to run at the same speed in ocamljs and js_of_ocaml.) It would be interesting to compare them on something computationally intensive like Andrej Bauer’s random-art.org.

Js_of_ocaml is more complete and careful in its implementation of OCaml (e.g. it supports int64s), and it generates much more compact code than ocamljs. I hope to close the gap in these areas, possibly by borrowing some code and good ideas from js_of_ocaml.

Mixing monadic and direct-style code with delimited continuations

2010-08-20T17:50:00.000-07:00

The Lwt library is a really nice way to write concurrent programs. A big downside, however, is that you can’t use direct-style libraries with it. Suppose we’re writing an XMPP server, and we want to parse XML as it arrives over a network connection, using Daniel Bünzli’s nice xmlm library. Xmlm can read from a string, or from a Pervasives.in_channel, or you can give it a function of type (unit -> int) to return the next character of input. But there is no way to have it read from an Lwt thread; that is, we can’t give it a function of type (unit -> int Lwt.t), since it doesn’t know what do with an Lwt.t. To keep track of the parser state at the point the input blocks, the whole library would need to be rewritten in Lwt style (i.e. monadic style).

Now, Lwt does provide the Lwt_preemptive module, which gives you a way to spin off a preemptive thread (implemented as an ordinary OCaml thread) and wait for its result in the usual Lwt way with bind. This is useful, but has two drawbacks: preemptive threads are preemptive, so you’re back to traditional locking if you want to operate on shared data; and preemptive threads are threads, so they are much heavier than Lwt threads, and (continuing the XMPP hypothetical) it may not be feasible to use one per open connection.

Fibers

What we would really like is to be able spin off a cooperative, direct-style thread. The thread needs a way to block on Lwt threads, but when it blocks we need to be able to schedule another Lwt thread. As a cooperative thread it of course has exclusive access to the process state while it is running. A cooperative, direct-style thread is sometimes called a coroutine (although to me that word connotes a particular style of inter-thread communication as well, where values are yielded between coroutines), or a fiber.

Here’s an API for mixing Lwt threads with fibers:

  val start : (unit -> 'a) -> 'a Lwt.t 
  val await : 'a Lwt.t -> 'a

The start function spins off a fiber, returning an Lwt thread which is woken with the result of the fiber once it completes. The await function (which may be called only from within a fiber) blocks on the result of an Lwt thread, allowing another Lwt thread to be scheduled while it is waiting.

With this API we could implement our XMPP server by calling xmlm from within a fiber, and passing it a function that awaits the next character available on the network connection. But how do we implement it?

Delimited continuations

Oleg Kiselyov’s recent announcement of a native-code version of his Delimcc library for delimited continuations in OCaml reminded me of two things:

I should find out what delimited continuations are.
They sound useful for implementing fibers.

The paper describing the library, Delimited Control in OCaml, Abstractly and Concretely, has a pretty good overview of delimited continuations, and section 2 of A Monadic Framework for Delimited Continuations is helpful too.

The core API is small:

  type 'a prompt 
  type ('a,'b) subcont 
  
  val new_prompt   : unit -> 'a prompt 
  
  val push_prompt  : 'a prompt -> (unit -> 'a) -> 'a 
  val take_subcont : 
    'b prompt -> (('a,'b) subcont -> unit -> 'b) -> 'a 
  val push_subcont : ('a,'b) subcont -> (unit -> 'a) -> 'b

I find it easiest to think about these functions as operations on the stack. A prompt is an identifier used to mark a point on the stack (the stack can be marked more than once with the same prompt). The function new_prompt makes a new prompt which is not equal to any other prompt.

The call push_prompt p f marks the stack with p then runs f, so the stack, growing to the right, looks like

 
  ABCDpEFGH

where ABCD are stack frames in the continuation of the call to push_prompt, and EFGH are frames created while running f. If f returns normally (that is, without calling take_subcont) then its return value is returned by push_prompt, and we are back to the original stack ABCD.

If take_subcont p g is called while running f, the stack fragment EFGH is packaged up as an ('a,'b) subcont and passed to g. You can think of an ('a,'b) subcont as a function of type 'a -> 'b, where 'a is the return type of the call to take_subcont and 'b is the return type of the call to push_prompt. Take_subcont removes the fragment pEFGH from the stack, and there are some new frames IJKL from running g, so we have

 
  ABCDIJKL

Now g can make use of the passed-in subcont using push_subcont. (Thinking of a subcont as a function, push_subcont is just a weird function application operator, which takes the argument as a thunk). Then the stack becomes

 
  ABCDIJKLEFGH

Of course g can call the subcont as many times as you like.

A common pattern is to re-mark the stack with push_prompt before calling push_subcont (so take_subcont may be called again). There is an optimized version of this combination called push_delim_subcont, which produces the stack

 
  ABCDIJKLpEFGH

The idea that a subcont is a kind of function is realized by shift0, which is like take_subcont except that instead of passing a subcont to g it passes an ordinary function. The passed function just wraps a call to push_delim_subcont. (It is push_delim_subcont rather than push_subcont for historical reasons I think—see the Monadic Framework paper for a comparison of various delimited continuation primitives.)

Implementing fibers

To implement fibers, we want start f to mark the stack, then run f; and await t to unwind the stack back to the mark, wait for t to complete, then restore the stack. Here is start:

  let active_prompt = ref None 
  
  let start f = 
    let t, u = Lwt.wait () in 
    let p = Delimcc.new_prompt () in 
    active_prompt := Some p; 
  
    Delimcc.push_prompt p begin fun () -> 
      let r = 
        try Lwt.Return (f ()) 
        with e -> Lwt.Fail e in 
      active_prompt := None; 
      match r with 
        | Lwt.Return v -> Lwt.wakeup u v 
        | Lwt.Fail e -> Lwt.wakeup_exn u e 
        | Lwt.Sleep -> assert false 
    end; 
    t

We make a sleeping Lwt thread, and store a new prompt in a global (this is OK because we won’t yield control to another Lwt thread before using it; of course this is not safe with OCaml threads). Then we mark the stack with push_prompt and run the fiber. (The let r = ... match r with ... is to avoid calling Lwt.wakeup{,_exn} in the scope of the try; we use Lwt.state as a handy type to store either a result or an exception.) If the fiber completes without calling await then all we do is wake up the Lwt thread with the returned value or exception.

Here is await:

  let await t = 
    let p = 
      match !active_prompt with 
        | None -> failwith "await called outside start" 
        | Some p -> p in 
    active_prompt := None; 
 
    match Lwt.poll t with 
      | Some v -> v 
      | None -> 
          Delimcc.shift0 p begin fun k -> 
            let ready _ = 
              active_prompt := Some p; 
              k (); 
              Lwt.return () in 
            ignore (Lwt.try_bind (fun () -> t) ready ready) 
          end; 
          match Lwt.poll t with 
            | Some v -> v 
            | None -> assert false

We first check to be sure that we are in the scope of start, and that t isn’t already completed (in which case we just return its result). If we actually need to wait for t, we call shift0, which capture the stack fragment back to the push_prompt call in start (this continuation includes the subsequent match Lwt.poll t and everything after the call to await), then try_bind so we can restore the stack fragment when t completes (whether by success or failure). When t completes, the ready function restores the global active_prompt, in case the fiber calls await again, then restores the stack by calling k (recall that this also re-marks the stack with p, which is needed if the fiber calls await again).

It’s pretty difficult to follow what’s going on here, so let’s try it with stacks. After calling start we have

 
  ABCDpEFGH

where ABCD is the continuation of push_prompt in start (just the return of t) and EFGH are frames created by the thunk passed to start. Now, a call to await (on an uncompleted thread) calls shift0, which packs up EFGH as k and unwinds the stack to p. The function passed to shift0 stores k in ready but doesn’t call it, and control returns to start (since the stack has been unwound).

The program continues normally until t completes. Now control is in Lwt.run_waiters running threads that were waiting on t; one of them is our ready function. When it is called, the stack is re-marked and EFGH is restored, so we have

 
  QRSTpEFGH

where QRST is wherever we happen to be in the main program, ending in Lwt.run_waiters. Now, EFGH ends with the second call to match Lwt.poll in await, which returns the value of t and continues the thunk passed to start. The stack is now marked with p inside Lwt.run_waiters, so when await is called again control returns there.

Events vs. threads

We have seen that we can use fibers to write Lwt threads in direct style. Should we abandon Lwt’s monadic style entirely, and use Lwt only for its event handling?

First, how does each style perform? Every time a fiber blocks and resumes, we have to copy, unwind, and restore its entire stack. With Lwt threads, the “stack” is a bunch of linked closures in the heap, so we don’t need to do anything to block or resume. On the other hand, building and garbage-collecting the closures is more expensive than pushing and popping the stack. We can imagine that which style performs better depends on the thread: if it blocks infrequently enough, the amortized cost of copying and restoring the stack might be lower than the cost of building and garbage-collecting the closures. (We can also imagine that a different implementation of delimited continuations might change this tradeoff.)

Second, how does the code look? The paper Cooperative Task Management without Manual Stack Management considers this question in the context of the “events vs. threads” debate. Many of its points lose their force when translated to OCaml and Lwt—closures, the >>= operator, and Lwt’s syntax extension go a long way toward making Lwt code look like direct style—but some are still germane. In favor of fibers is that existing direct-style code need not be rewritten to work with Lwt (what motivated us in the first place). In favor of monadic style is that the type of a function reflects the possibility that it might block, yield control to another thread, and disturb state invariants.

Direct-style FRP

We could apply this idea, of replacing monadic style with direct style using delimited continuations, to other monads—in particular to the froc library for functional reactive programming. (The Scala.React FRP library also uses delimited continuations to implement direct style; see Deprecating the Observer Pattern for details.)

Here’s the API:

  val direct : (unit -> 'a) -> 'a Froc.behavior 
  val read : 'a Froc.behavior -> 'a

Not surprisingly, it’s just the same as for Lwt, but with a different monad and different names (I don’t know if direct is quite right but it is better than start). There is already a function Froc.sample with the same type as read, but it has a different meaning: sample takes a snapshot of a behavior but creates no dependency on it.

The implementation is very similar as well:

  let active_prompt = ref None 
  
  let direct f = 
    let t, u = Froc_ddg.make_changeable () in 
    let p = Delimcc.new_prompt () in 
    active_prompt := Some p; 
  
    Delimcc.push_prompt p begin fun () -> 
      let r = 
        try Froc_ddg.Value (f ()) 
        with e -> Froc_ddg.Fail e in 
      active_prompt := None; 
      Froc_ddg.write_result u r 
    end; 
    (Obj.magic t : _ Froc.behavior)

This is essentially the same code as start, modulo the change of monad. However, some of the functions we need aren’t exported from Froc, so we need to use the underlying Froc_ddg module and magic the result at the end. Froc_ddg.make_changeable is the equivalent of Lwt.wait: it returns an “uninitialized” monadic value along with a writer for that value. We use Froc_ddg.result instead of Lwt.state to store a value or exception, and Froc_ddg.write_result instead of the pattern match and Lwt.wakeup{,_exn}.

  
  let read t = 
    let p = 
      match !active_prompt with 
        | None -> failwith "read called outside direct" 
        | Some p -> p in 
    active_prompt := None; 
  
    Delimcc.shift0 p begin fun k -> 
      Froc.notify_result_b t begin fun _ -> 
        active_prompt := Some p; 
        k () 
      end 
    end; 
    Froc.sample t

And this is essentially the same code as await. A Froc.behavior always has a value, so we don’t poll it as we did with Lwt.t, but go straight to shift0. We have Froc.try_bind but it’s a little more compact to use use notify_result_b, which passes a result.

Monadic reflection

The similarity between these implementations suggests that we could use the same code to get a direct style version of any monad; we only need a way to create an uninitialized monadic value, then set it. The call to Lwt.poll in await is an optimization which we would have to forgo. (In both these examples we have a monad with failure, and try_bind, but we could do without it.)

A little googling turns up Andrzej Filinski’s paper Representing Monads, which reaches the same conclusion, with a lot more rigor. In that work start/direct are called reify, and await/read are called reflect. Reflect is close to the implementations above, but in reify the paper marks the stack inside a function passed to bind rather than creating an uninitialized monadic value and later setting it.

This makes sense—inside bind an uninitialized monadic value is created, then set from the result of the function passed to bind. So we are partially duplicating bind in the code above. If we mark the stack in the right place we should be able to use bind directly. It is hard to see how to make the details work out, however, since Lwt.bind and Froc.bind each have some cases where uninitialized values are not created.

(You can find the complete code for Lwt fibers here and direct-style froc here.)

(revised 10/22)

Reading Camlp4, part 10: custom lexers

2010-08-13T12:16:00.000-07:00

As a final modification to our running JSON quotation example, I want to repair a problem noted in the first post—that the default lexer does not match the JSON spec—and in doing so demonstrate the use of custom lexers with Camlp4 grammars. We’ll parse UTF8-encoded Javascript using the ulex library.

To use a custom lexer, we need to pass a module matching the Lexer signature (in camlp4/Camlp4/Sig.ml) to Camlp4.PreCast.MakeGram. (Recall that we get back an empty grammar which we then extend with parser entries. ) Let’s look at the signature and its subsignatures, and our implementation of each:

Error

  module type Error = sig 
    type t 
    exception E of t 
    val to_string : t -> string 
    val print : Format.formatter -> t -> unit 
  end

First we have a module for packaging up an exception so it can be handled generically (in particular it may be registered with Camlp4.ErrorHandler for common printing and handling). We have simple exception needs so we give a simple implementation:

  module Error = 
  struct 
    type t = string 
    exception E of string 
    let print = Format.pp_print_string 
    let to_string x = x 
  end 
  let _ = let module M = Camlp4.ErrorHandler.Register(Error) in ()

Token

Next we have a module defining the tokens our lexer supports:

  module type Token = sig 
    module Loc : Loc 
  
    type t 
  
    val to_string : t -> string 
    val print : Format.formatter -> t -> unit 
    val match_keyword : string -> t -> bool 
    val extract_string : t -> string 
  
    module Filter : ... (* see below *) 
    module Error : Error 
  end

The type t represents a token. This can be anything we like (in particular it does not need to be a variant with arms KEYWORD, EOI, etc. although that is the conventional representation), so long as we provide the specified functions to convert it to a string, print it to a formatter, determine if it matches a string keyword (recall that we can use literal strings in grammars; this function is called to see if the next token matches a literal string), and extract a string representation of it (called when you bind a variable to a token in a grammar—e.g. n = NUMBER). Here’s our implementation:

  type token = 
    | KEYWORD  of string 
    | NUMBER   of string 
    | STRING   of string 
    | ANTIQUOT of string * string 
    | EOI 
 
  module Token = 
  struct 
    type t = token 
  
    let to_string t = 
      let sf = Printf.sprintf in 
      match t with 
        | KEYWORD s       -> sf "KEYWORD %S" s 
        | NUMBER s        -> sf "NUMBER %s" s 
        | STRING s        -> sf "STRING \"%s\"" s 
        | ANTIQUOT (n, s) -> sf "ANTIQUOT %s: %S" n s 
        | EOI             -> sf "EOI" 
  
    let print ppf x = Format.pp_print_string ppf (to_string x) 
  
    let match_keyword kwd = 
      function 
        | KEYWORD kwd' when kwd = kwd' -> true 
        | _ -> false 
  
    let extract_string = 
      function 
        | KEYWORD s | NUMBER s | STRING s -> s 
        | tok -> 
            invalid_arg 
              ("Cannot extract a string from this token: " ^ 
                 to_string tok) 
 
    module Loc = Camlp4.PreCast.Loc 
    module Error = Error 
    module Filter = ... (* see below *) 
  end

Not much to it. KEYWORD covers true, false, null, and punctuation; NUMBER and STRING are JSON numbers and strings; as we saw last time antiquotations are returned in ANTIQUOT; finally we signal the end of the input with EOI.

Filter

  module Filter : sig 
    type token_filter = 
      (t * Loc.t) Stream.t -> (t * Loc.t) Stream.t 
 
    type t 
 
    val mk : (string -> bool) -> t 
    val define_filter : t -> (token_filter -> token_filter) -> unit 
    val filter : t -> token_filter 
    val keyword_added : t -> string -> bool -> unit 
    val keyword_removed : t -> string -> unit 
  end;

The Filter module provides filters over token streams. We don’t have a need for it in the JSON example, but it’s interesting to see how it is implemented in the default lexer and used in the OCaml parser. The argument to mk is a function indicating whether a string should be treated as a keyword (i.e. the literal string is used in the grammar), and the default lexer uses it to filter the token stream to convert identifiers into keywords. If we wanted the JSON parser to be extensible, we would need to take this into account; instead we’ll just stub out the functions:

  module Filter = 
  struct 
    type token_filter = 
      (t * Loc.t) Stream.t -> (t * Loc.t) Stream.t 
 
    type t = unit 
 
    let mk _ = () 
    let filter _ strm = strm 
    let define_filter _ _ = () 
    let keyword_added _ _ _ = () 
    let keyword_removed _ _ = () 
  end

Lexer

Finally we have Lexer, which packages up the other modules and provides the actual lexing function. The lexing function takes an initial location and a character stream, and returns a stream of token and location pairs:

module type Lexer = sig 
  module Loc : Loc 
  module Token : Token with module Loc = Loc 
  module Error : Error 
 
  val mk : 
    unit -> 
    (Loc.t -> char Stream.t -> (Token.t * Loc.t) Stream.t) 
end

I don’t want to go through the whole lexing function; it is not very interesting. But here is the main loop:

let rec token c = lexer 
  | eof -> EOI 
 
  | newline -> next_line c; token c c.lexbuf 
  | blank+ -> token c c.lexbuf 
 
  | '-'? ['0'-'9']+ ('.' ['0'-'9']* )? 
      (('e'|'E')('+'|'-')?(['0'-'9']+))? -> 
        NUMBER (L.utf8_lexeme c.lexbuf) 
 
  | [ "{}[]:," ] | "null" | "true" | "false" -> 
      KEYWORD (L.utf8_lexeme c.lexbuf) 
 
  | '"' -> 
      set_start_loc c; 
      string c c.lexbuf; 
      STRING (get_stored_string c) 
 
  | "$" -> 
      set_start_loc c; 
      c.enc := Ulexing.Latin1; 
      let aq = antiquot c lexbuf in 
      c.enc := Ulexing.Utf8; 
      aq 
 
  | _ -> illegal c

The lexer syntax is an extension provided by ulex; the effect is similar to ocamllex. The lexer needs to keep track of the current location and return it along with the token (next_line advances the current location; set_start_loc is for when a token spans multiple ulex lexemes). The lexer also needs to parse antiquotations, taking into account nested quotations within them.

(I think it is not actually necessary to lex JSON as UTF8. The only place that non-ASCII characters can appear is in a string. To lex a string we just accumulate characters until we see a double-quote, which cannot appear as part of a multibyte character. So it would work just as well to accumulate bytes. I am no Unicode expert though. This example was extracted from the Javascript parser in jslib, where I think UTF8 must be taken into account.)

Hooking up the lexer

There are a handful of changes we need to make to call the custom lexer:

In Jq_parser we make the grammar with the custom lexer module, and open it so the token constructors are available; we also replace the INT and FLOAT cases with just NUMBER; for the other cases we used the same token constructor names as the default lexer so we don’t need to change anything.

  open Jq_lexer 
 
  module Gram = Camlp4.PreCast.MakeGram(Jq_lexer) 
 
  ... 
      | n = NUMBER -> Jq_number (float_of_string n)

In Jq_quotations we have Camlp4.PreCast open (so references to Ast in the <:expr< >> quotations resolve), so EOI is Camlp4.PreCast.EOI; we want Jq_lexer.EOI, so we need to write it explicitly:

  json_eoi: [[ x = Jq_parser.json; `Jq_lexer.EOI -> x ]];

(Recall that the backtick lets us match a constructor directly; for some reason we can’t module-qualify EOI without it.)

That’s it.

I want to finish off this series next time by covering grammar extension, with an example OCaml syntax extension.

(You can find the complete code for this example here.)

Reading Camlp4, part 9: implementing antiquotations

2010-08-05T17:55:00.000-07:00

In this post I want to complicate the JSON quotation library from the previous post by adding antiquotations.

AST with antiquotations

In order to support antiquotations we will need to make some changes to the AST. Here is the new AST type:

  type t = 
      ... (* base types same as before *) 
    | Jq_array  of t 
    | Jq_object of t 
  
    | Jq_colon  of t * t 
    | Jq_comma  of t * t 
    | Jq_nil 
  
    | Jq_Ant    of Loc.t * string

Let’s first consider Jq_Ant. Antiquotations $tag:body$ are returned from the lexer as an ANTIQUOT token containing the (possibly empty) tag and the entire body (including nested quotations/antiquotations) as a string. In the parser, we deal only with the JSON AST, so we can’t really do anything with an antiquotation but return it to the caller (wrapped in a Jq_Ant).

The lifting functions generated by Camlp4MetaGenerator treat Jq_Ant (and any other constructor ending in Ant) specially: instead of

  | Jq_Ant (loc, s) -> 
      <:expr< Jq_Ant ($meta_loc loc$, $meta_string s$) >>

they have

  | Jq_Ant (loc, s) -> ExAnt (loc, s)

Instead of lifting the constructor, they translate it directly to ExAnt (or PaAnt, depending on the context). We don’t otherwise have locations in our AST, but Jq_Ant must take a Loc.t argument because ExAnt does. Later, when we walk the OCaml AST expanding antiquotations, it will be convenient to have them as ExAnt nodes rather than lifted Jq_Ant nodes.

In addition to Jq_Ant, we have new Jq_nil, Jq_comma, and Jq_colon constructors, and we have replaced the lists in Jq_array and Jq_object with just t. The idea here is that in an antiquotation in an array, e.g.

  <:json< [ 1, true, $x$, "foo" ] >>

we would like to be able to substitute any number of elements (including zero) into the array in place of x. If Jq_array took a list, we could substitute exactly one element only. So instead we build a tree out of Jq_comma and Jq_nil constructors; at any point in the tree we can substitute zero (Jq_nil), one (any other t constructor), or more than one (a Jq_comma subtree) elements. We recover a list by taking the fringe of the final tree. (In the Jq_ast module there are functions t_of_list and list_of_t which convert between these representations.) For objects, we use Jq_colon to associate a name with a value, then build a tree of name/value pairs the same way.

While this AST meets the need, it is now possible to have ill-formed ASTs, e.g. a bare Jq_nil, or a Jq_object where the elements are not Jq_colon pairs, or where the first argument of Jq_colon is not a Jq_string. This is annoying, but it is hard to see how to avoid it without complicating the AST and making it more difficult to use antiquotations.

Parsing antiquotations

Here is the updated parser:

  EXTEND Gram 
    json: [[ 
        ... (* base types same as before *) 
  
      | `ANTIQUOT 
          (""|"bool"|"int"|"flo"|"str"|"list"|"alist" as n, s) -> 
            Jq_Ant (_loc, n ^ ":" ^ s) 
  
      | "["; es = SELF; "]" -> Jq_array es 
      | "{"; kvs = SELF; "}" -> Jq_object kvs 
  
      | e1 = SELF; ","; e2 = SELF -> Jq_comma (e1, e2) 
      | -> Jq_nil 
 
      | e1 = SELF; ":"; e2 = SELF -> Jq_colon (e1, e2)  
    ]]; 
  END

We want to support several kinds of antiquotations: For individual elements, $x$ (where x is a t), or $bool:x$ , $int:x$ , $flo:x$ , or $str:x$ (where x is an OCaml bool, int, float, or string); for these latter cases we need to wrap x in the appropriate t constructor. For lists of elements, $list:x$ where x is a t list, and $alist:x$ where x is a (string * t) list; for these we need to convert x to the Jq_comma / Jq_nil representation above. But in the parser all we do is return a Jq_Ant containing the tag and body of the ANTIQUOT token. (We return it in a single string separated by : because only one string argument is provided in ExAnt.)

It is the parser which controls where antiquotations are allowed, by providing a case for ANTIQUOT in a particular entry, and which tags are allowed in an entry. In this example we have only one entry, so we allow any supported antiquotation anywhere a JSON expression is allowed, but you can see in the OCaml parsers that the acceptable antiquotations can be context-sensitive, and the interpretation of the same antiquotation can vary according to the context (e.g. different conversions may be needed).

For arrays and objects, we parse SELF in place of the list. The cases for Jq_comma and Jq_nil produce the tree representation, and the case for Jq_colon allows name/value pairs. Recall that a token or keyword is preferred over the empty string, so the Jq_nil case matches only when none of the others do. In particular, the quotation <:json< >> parses to Jq_nil.

We can see that not only is the AST rather free, but so is the parser: it will parse strings which are not well-formed JSON, like <:json< 1, 2 >> or <json:< "foo" : true >>. We lose safety, since a mistake may produce an ill-formed AST, but gain convenience, since we may want to substitute these fragments in antiquotations. As an alternative, we could have a more restrictive parser (e.g. no commas allowed at the json entry), and provide different quotations for different contexts (e.g. <:json_list< >>, allowing commas) for use with antiquotations. For this small language I think it is not worth it.

Expanding antiquotations

To expand antiquotations, we take a pass over the OCaml AST we got from lifting the JSON AST; look for ExAst nodes; parse them as OCaml; then apply the appropriate conversion according to the antiquotation tag. To walk the AST we extend the Ast.map object (generated with the Camlp4FoldGenerator filter) so we don’t need a bunch of boilerplate cases which return the AST unchanged. Here’s the code:

  module AQ = Syntax.AntiquotSyntax 
  
  let destruct_aq s = 
    let pos = String.index s ':' in 
    let len = String.length s in 
    let name = String.sub s 0 pos 
    and code = String.sub s (pos + 1) (len - pos - 1) in 
    name, code 
  
  let aq_expander = 
  object 
    inherit Ast.map as super 
    method expr = 
      function 
        | Ast.ExAnt (_loc, s) -> 
            let n, c = destruct_aq s in 
            let e = AQ.parse_expr _loc c in 
            begin match n with 
              | "bool" -> <:expr< Jq_ast.Jq_bool $e$ >> 
              | "int" -> 
                  <:expr< Jq_ast.Jq_number (float_of_int $e$) >> 
              | "flo" -> <:expr< Jq_ast.Jq_number $e$ >> 
              | "str" -> <:expr< Jq_ast.Jq_string $e$ >> 
              | "list" -> <:expr< Jq_ast.t_of_list $e$ >> 
              | "alist" -> 
                  <:expr< 
                    Jq_ast.t_of_list 
                      (List.map 
                        (fun (k, v) -> 
                          Jq_ast.Jq_colon (Jq_ast.Jq_string k, v)) 
                        $e$) 
                  >> 
              | _ -> e 
            end 
        | e -> super#expr e 
    method patt = 
      function 
        | Ast.PaAnt (_loc, s) -> 
            let _, c = destruct_aq s in 
            AQ.parse_patt _loc c 
        | p -> super#patt p 
  end

When we find an antiquotation, we unpack the tag and contents (with destruct_aq), parse it using the host syntax (given by Syntax.AntiquotSyntax from Camlp4.PreCast, which might be either the original or revised syntax depending which modules are loaded), then insert conversions depending on the tag. Conversions don’t make sense in a pattern context, so for patterns we just return the parsed antiquotation.

Finally we hook into the quotation machinery, mostly as before:

let parse_quot_string loc s = 
  let q = !Camlp4_config.antiquotations in 
  Camlp4_config.antiquotations := true; 
  let res = Jq_parser.Gram.parse_string json_eoi loc s in 
  Camlp4_config.antiquotations := q; 
  res 
 
let expand_expr loc _ s = 
  let ast = parse_quot_string loc s in 
  let meta_ast = Jq_ast.MetaExpr.meta_t loc ast in 
  aq_expander#expr meta_ast 
 
;; 
 
Q.add "json" Q.DynAst.expr_tag expand_expr;

Before parsing a quotation we set a flag, which is checked by the lexer, to allow antiquotations; the flag is initially false, so antiquotations appearing outside a quotation won’t be parsed. After lifting the JSON AST to an OCaml AST, we run the result through the antiquotation expander.

For concreteness, let’s follow the life of a quotation as it is parsed and expanded. Say we begin with

  <:json< [ 1, $int:x$ ] >>

After parsing:

  Jq_array (Jq_comma (Jq_number 1., Jq_Ant (_loc, "int:x")))

After lifting:

  <:expr< 
    Jq_array (Jq_comma (Jq_number 1., $ExAnt (_loc, "int:x")$)) 
  >>

After expanding:

  <:expr< 
    Jq_array (Jq_comma (Jq_number 1., Jq_number (float_of_int x))) 
  >>

Nested quotations

Let’s see that again with a nested quotation:

  <:json< $<:json< 1 >>$ >>

After parsing:

  Jq_Ant (_loc, "<:json< 1 >>")

After lifting:

  ExAnt (_loc, "<:json< 1 >>")

After expanding (during which we parse and expand "<:json< 1 >>" to <:expr< Jq_number 1. >>):

  <:expr< Jq_number 1. >>

A wise man once said “The string is a stark data structure and everywhere it is passed there is much duplication of process.” So it is with Camlp4 quotations: each nested quotation is re-parsed; each quotation implementation must deal with parsing host-language antiquotation strings; and the lexer for each implementation must lex antiquotations and nested quotations. (Since we used the default lexer we didn’t have to worry about this, but see the next post.) It would be nice to have more support from Camlp4. On the other hand, while what happens at runtime seems baroque, the code above is relatively straightforward, and since we work with strings we can use any parser technology we like.

It has not been much (marginal) trouble to handle quotations in pattern contexts, but they are not tremendously useful. The problem is that we normally don’t care about the order of the fields in a JSON object, or if there are extra fields; we would like to write

  match x with 
    | <:json< { 
        "foo" : $foo$ 
      } >> -> ... (* do something with foo *)

and have it work wherever the foo field is in the object. This is a more complicated job than just lifting the JSON AST. For an alternative approach to processing JSON using a list-comprehension syntax, see json_compr, an example I wrote for the upcoming metaprogramming tutorial at CUFP. For a fancier JSON DSL (including the ability to induct a type description from a bunch of examples!), see Julien Verlauget’s jsonpat. And for a framework to extend OCaml’s pattern-matching syntax, see Jeremy Yallop’s ocaml-patterns.

Next time we will see how to use a custom lexer with a Camlp4 grammar.

(You can find the complete code for this example here.)

Reading Camlp4, part 8: implementing quotations

2010-08-03T16:47:00.000-07:00

The Camlp4 system of quotations and antiquotations is an awesome tool for producing and consuming OCaml ASTs. In this post (and the following one) we will see how to provide this facility for other syntaxes and ASTs. Here we consider just quotations; we’ll add antiquotations in the following post.

An AST for JSON

Our running example will be a quotation expander for JSON. Let’s begin with the JSON AST, in a module Jq_ast:

  type t = 
    | Jq_null 
    | Jq_bool   of bool 
    | Jq_number of float 
    | Jq_string of string 
    | Jq_array  of t list 
    | Jq_object of (string * t) list

This is the same (modulo order and names) as json_type from the json-wheel library, but for various reasons we will not be able to use json_type. The Jq_ prefix is for json_quot, the name of this little library.

Parsing JSON

We’ll use a Camlp4 grammar to parse JSON trees. It is not necessary to use Camlp4’s parsing facilities in order to implement quotations—ultimately we will need to provide just a function from strings to ASTs, so we could use ocamlyacc or what-have-you instead—but it is convenient. Here is the parser:

  open Camlp4.PreCast 
  open Jq_ast 
  
  module Gram = MakeGram(Lexer) 
  let json = Gram.Entry.mk "json" 
  
  ;; 
  
  EXTEND Gram 
    json: [[ 
        "null" -> Jq_null 
      | "true" -> Jq_bool true 
      | "false" -> Jq_bool false 
      | i = INT -> Jq_number (float_of_string i) 
      | f = FLOAT -> Jq_number (float_of_string f) 
      | s = STRING -> Jq_string s 
      | "["; es = LIST0 json SEP ","; "]" -> Jq_array es 
      | "{"; 
          kvs = 
            LIST0 
              [ s = STRING; ":"; j = json -> (s, j) ] 
              SEP ","; 
        "}" -> Jq_object kvs 
    ]]; 
  END

We use the default Camlp4 lexer (with MakeGram(Lexer)); as we have seen, keywords mentioned in a Camlp4 grammar are added to the lexer, so we don’t need to do anything special to lex null etc. However, while JSON/Javascript has a single number type, the default lexer returns different tokens for INT and FLOAT numbers, so we convert each to Jq_number. In fact, these tokens (along with STRING) represent OCaml integer, float and string literals, which do not exactly match the corresponding JSON ones, but they are fairly close so let’s not worry about it for now; we’ll revisit the lexer in a later post.

The parser itself is pleasingly compact; we can make good use of the LIST0 special symbol and an anonymous entry for parsing objects. Unfortunately things will get a little more complicated when we come to antiquotations.

Lifting the AST

Next we need to “lift” values of the JSON AST to values of the OCaml AST. What does “lift” mean, and why do we need to do it? The goal is to convert quotations in OCaml code, such as

  let x = <:json< [ 1, "foo", true ] >>

into the equivalent

  let x = 
    Jq_ast.Jq_array [ 
      Jq_ast.Jq_number 1.; 
      Jq_ast.Jq_string "foo"; 
      Jq_ast.Jq_bool true 
    ]

This is to happen as part of Camlp4 preprocessing, which produces an OCaml AST, so what we produce in place of the <:json< ... >> expression must be a fragment of OCaml AST. We have a parser which takes a valid JSON string to the JSON AST; what remains is to take a JSON AST value to the corresponding OCaml AST. So we need a function with cases something like:

  | Jq_null -> <:expr< Jq_null >> 
  | Jq_number n -> <:expr< Jq_number $`flo:n$ >> 
  | ...

It is not such a big deal to hand-write this lifting function for a small AST like JSON, but it is arduous and error-prone for full-size ASTs. Fortunately Camlp4 has a filter which does it for us. Let’s first look at the signature of the Jq_ast module:

  open Camlp4.PreCast 
  
  type t = ... (* as above *) 
  
  module MetaExpr : 
  sig 
    val meta_t : Ast.loc -> t -> Ast.expr 
  end 
  
  module MetaPatt : 
  sig 
    val meta_t : Ast.loc -> t -> Ast.patt 
  end

The generated modules MetaExpr and MetaPatt provide functions to lift a JSON AST to either an OCaml expr (when the quotation appears as an expression) or patt (when it appears as a pattern). The loc arguments are inserted into the resulting OCaml AST so that compile errors have correct locations.

Now the implementation of Jq_ast:

  module Jq_ast = 
  struct 
    type float' = float 
  
    type t = (* almost as above *) 
        ... 
      | Jq_number of float' 
        ... 
  end 
  
  include Jq_ast 
  
  open Camlp4.PreCast (* for Ast refs in generated code *) 
  
  module MetaExpr = 
  struct 
    let meta_float' _loc f = <:expr< $`flo:f$ >> 
    include Camlp4Filters.MetaGeneratorExpr(Jq_ast) 
  end 
  
  module MetaPatt = 
  struct 
    let meta_float' _loc f = <:patt< $`flo:f$ >> 
    include Camlp4Filters.MetaGeneratorPatt(Jq_ast) 
  end

The file needs the Camlp4MetaGenerator filter (the camlp4.metagenerator package with findlib). The main idea is that the calls to Camlp4Filters.MetaGenerator{Expr,Patt} are expanded into the lifting functions. But there are a couple of fussy details:

First: The argument module Jq_ast which we pass to the generators is used both on the left and right of the generated function; if you look at the generated code there are cases like:

  | Jq_ast.Jq_null -> <:expr< Jq_ast.Jq_null >>

(The <:expr< .. >> is already expanded in the actual generated code.) We need the AST to be available qualified by the module Jq_ast both in the current file and also in code that uses the quotation. So we have a nested Jq_ast module (for local uses, on the left-hand side) which we include (for external uses, on the right-hand side).

Second: The generators scan all the types defined in the current module, then generate code from the last-appearing recursive bundle. (In this case the recursive bundle contains just t, but in general there can be more than one; mutually recursive lifting functions are generated.) There are some special cases for predefined types, and in particular for float; however, it seems to be wrong:

  let meta_float _loc s = Ast.ExFlo (_loc, s)

The ExFlo constructor takes a string representing the float, but calls to this function are generated when you use float in your type. To work around this, we define the type float' (on its own rather than as part of the last-appearing recursive bundle, or else Camlp4 would generate a meta_float' that calls meta_float), and provide correct meta_float' functions. There is a similar bug with meta_int, but meta_bool is correct, so our Jq_bool case does not need fixing.

(It is interesting to contrast this approach of lifting the AST with how it is handled in Template Haskell using the “scrap your boilerplate” pattern; see Geoffrey Mainland’s paper Why It’s Nice to be Quoted.)

Quotations

Finally we can hook the parser and AST lifter into Camlp4’s quotation machinery, in the Jq_quotations module:

  open Camlp4.PreCast 
  
  module Q = Syntax.Quotation 
  
  let json_eoi = Jq_parser.Gram.Entry.mk "json_eoi" 
  
  EXTEND Jq_parser.Gram 
    json_eoi: [[ x = Jq_parser.json; EOI -> x ]]; 
  END;; 
  
  let parse_quot_string loc s = 
    Jq_parser.Gram.parse_string json_eoi loc s 
  
  let expand_expr loc _ s = 
    Jq_ast.MetaExpr.meta_t loc (parse_quot_string loc s) 
  
  let expand_str_item loc _ s = 
    let exp_ast = expand_expr loc None s in 
    <:str_item@loc< $exp:exp_ast$ >> 
  
  let expand_patt loc _ s = 
    Jq_ast.MetaPatt.meta_t loc (parse_quot_string loc s) 
  
  ;; 
  
  Q.add "json" Q.DynAst.expr_tag expand_expr; 
  Q.add "json" Q.DynAst.patt_tag expand_patt; 
  Q.add "json" Q.DynAst.str_item_tag expand_str_item; 
  Q.default := "json"

First, we make a new grammar entry json_eoi which parses a json expression followed by the end-of-input token EOI. Grammar entries ordinarily ignore the rest of the input after a successful parse. If we were to use the json entry directly, we would silently accept quotations with trailing garbage, and in particular incorrect quotations that happen to have a correct prefix, rather than alerting the user.

Then we register quotation expanders for the <:json< >> quotation in the expr, patt, and str_item contexts (str_item is useful because that is the context at the top level prompt), using Syntax.Quotation.add. All the expanders do is call the parser, then run the result through the appropriate lifting function.

Finally we set json as the default quotation, so we can just say << >> for JSON quotations. This is perhaps a bit cheeky, since the user may want something else as the default quotation; whichever module is loaded last wins.

It is worth reflecting on how the quotation mechanism works in the OCaml parser: There is a lexer token for quotations, but no node in the OCaml AST, so everything must happen in the parser. When a quotation is lexed, its entire contents is returned as a string. (Nested quotations are matched in the lexer—see quotation and antiquot in camlp4/Camlpl4/Struct/Lexer.mll—without considering the embedded syntax; this makes the << and >> tokens unusable in the embedded syntax.) The string is then expanded according to the table of registered expanders; expanders return a fragment of OCaml AST which is inserted into the parse tree.

You might have thought (as I did) that something fancy happens with quotations, e.g. Camlp4 switches to a different parser on the fly, then back to the original parser for antiquotations. But it is much simpler than that. At the same time, it is much more complicated than that, as we will see next time when we cover antiquotations (and in particular how nested antiquotations/quotations are handled).

(You can find the complete code here, including a pretty-printer and integration with the top level; after building and installing you can say e.g.

  # << [ 1, "foo", true ] >>;; 
  - : Jq_ast.t = [ 1, "foo", true ]

although without antiquotations it is not very useful.)

Reading Camlp4, part 7: revised syntax

2010-07-27T11:01:00.000-07:00

As we have seen, Camlp4 contains an alternative syntax for OCaml, the “revised” syntax, which attempts to correct some infelicities of the original syntax, and to make it easier to parse and pretty-print. Most (all?) of Camlp4 itself is written in this syntax.

While OCaml quotations may be written in either original or revised syntax, original syntax quotations are not as well-supported; there are AST constructions which are difficult or impossible to generate from original syntax quotations. (As I understand it, part of the motivation for the revised syntax was to provide more context, in the form of extra brackets etc., so that antiquotations work more smoothly.)

I have always felt that the revised syntax is a pointless idiosyncrasy, and that whatever value it might bring is offset by the mental clutter of working with two syntaxes (since most code is still written in the original syntax). So I have stuck with original syntax quotations in this series, and recommended that you fall back to AST constructors when quotations don’t work out. However, the situation with original syntax quotations seems to have gotten worse in the upcoming OCaml 3.12.0 release (see bugs 5080 and 5104).

These bugs affected my orpc and ocamljs projects, and I decided to use revised syntax quotations rather than uglying up the code with AST constructors. This turned out to be not so bad, requiring only a few changes. Fortunately, you can choose for each source file which kind to use (in ocamlbuild you can give the pkg_camlp4.quotations.o or pkg_camlp4.quotations.r tags per file), so I left quotations in files that were unaffected or only lightly affected in the original syntax.

I don’t have anything new to say about the revised syntax, but I want to point out the following resources:

The final word on the revised syntax is of course the parser itself, found in Camlp4OCamlRevisedParser.ml; you may find these earlier posts useful in making sense of it.

Reading Camlp4, part 6: parsing

2010-05-19T21:22:00.000-07:00

In this post I want to discuss Camlp4’s stream parsers and grammars. Since the OCaml parsers in Camlp4 (which we touched on previously) use them, it’s necessary to understand them in order to write syntax extensions; independently, they are a nice alternative to ocamlyacc and other parser generators. Stream parsers and grammars are outlined for the old Camlp4 in the tutorial and manual, but some of the details have changed, and there are many aspects of grammars which are given only a glancing treatment in that material.

Streams and stream parsers

Parsers generated from Camlp4 grammars are built on stream parsers, so let’s start there. It will be easier to explain grammars with this background in hand, and we will see that it is sometimes useful to drop down to stream parsers when writing grammars.

A stream of type 'a Stream.t is a sequence of elements of type 'a. Elements of a stream are accessed sequentially; reading the first element of a stream has the side effect of advancing the stream to the next element. You can also peek ahead into a stream without advancing it. Camlp4 provides a syntax extension for working with streams, which expands to operations on the Stream module of the standard library.

There are various ways to make a stream but we’ll focus on consuming them; for testing you can make a literal stream with the syntax [< '"foo"; '"bar"; '"baz" >]—note the extra single-quotes. With the parser keyword we can write a function to consume a stream by pattern-matching over prefixes of the stream:

let rec p = parser 
  | [< '"foo"; 'x; '"bar" >] -> "foo-bar+" ^ x 
  | [< '"baz"; y = p >] -> "baz+" ^ y

The syntax '"foo" means match a value "foo"; 'x means match any value, binding it to x, which can be used on the right-hand side of the match as usual; and y = p means call the parser p on the rest of the stream, binding the result to y. You probably get the rough idea, but let’s run it through Camlp4 to see exactly what’s happening:

let rec p (__strm : _ Stream.t) = 
  match Stream.peek __strm with 
  | Some "foo" -> 
      (Stream.junk __strm; 
       (match Stream.peek __strm with 
        | Some x -> 
            (Stream.junk __strm; 
             (match Stream.peek __strm with 
              | Some "bar" -> (Stream.junk __strm; "foo-bar+" ^ x) 
              | _ -> raise (Stream.Error ""))) 
        | _ -> raise (Stream.Error ""))) 
  | Some "baz" -> 
      (Stream.junk __strm; 
       let y = 
         (try p __strm 
          with | Stream.Failure -> raise (Stream.Error "")) 
       in "baz+" ^ y) 
  | _ -> raise Stream.Failure

We can see that “parser” is perhaps a strong word for this construct; it’s really just a nested pattern match. The generated function peeks the next element in the stream, then junks it once it finds a match (advancing the stream to the next element). If there’s no match on the first token, that’s a Stream.Failure (the stream is not advanced, giving us the opportunity to try another parser); but once we have matched the first token, a subsequent match failure is a Stream.Error (we have committed to a branch, and advanced the stream; if the parse fails now we can’t try another parser).

A call to another parser as the first element of the pattern is treated specially: for this input

let rec p = parser 
  | [< x = q >] -> x 
  | [< '"bar" >] -> "bar"

we get

let rec p (__strm : _ Stream.t) = 
  try q __strm 
  with 
  | Stream.Failure -> 
      (match Stream.peek __strm with 
       | Some "bar" -> (Stream.junk __strm; "bar") 
       | _ -> raise Stream.Failure)

So there is a limited means of backtracking: if q fails with Stream.Failure (meaning that the stream has not been advanced) we try the next arm of the parser.

It’s easy to see what would happen if we were to use the same literal as the first element of more than one arm: the first one gets the match. Same if we were to make a recursive call (to the same parser) as the first element: we’d get an infinite loop, since it’s just a function call. So we can’t give arbitrary BNF-like grammars to parser. We could use it as a convenient way to hand-write a recursive-descent parser, but we won’t pursue that idea here. Instead, let’s turn to Camlp4’s grammars, which specify a recursive-descent parser using a BNF-like syntax.

Grammars

Here is a complete example of a grammar:

open Camlp4.PreCast 
module Gram = MakeGram(Lexer) 
let expr = Gram.Entry.mk "expr" 
EXTEND Gram 
  expr: 
    [[ 
       "foo"; x = LIDENT; "bar" -> "foo-bar+" ^ x 
     | "baz"; y = expr -> "baz+" ^ y 
     ]]; 
END 
;; 
try 
  print_endline 
    (Gram.parse_string expr Loc.ghost Sys.argv.(1)) 
with Loc.Exc_located (_, x) -> raise x

You can build it with the following command:

ocamlfind ocamlc \ 
   -linkpkg -syntax camlp4o \ 
  -package camlp4.extend -package camlp4.lib \ 
  grammar1.ml -o grammar1

Let’s cover the infrastructure before investigating EXTEND. We have a grammar module Gram which we got from Camlp4.PreCast; this is an empty grammar using a default lexer. We have an entry (a grammar nonterminal) expr, which is an OCaml value. We can parse a string starting at an entry using Gram.parse_string (we have to pass it an initial location). We trap Loc.Exc_located (which attaches a location to exceptions raised in parsing) and re-raise the underlying exception so it gets printed. (In subsequent examples I will give just the EXTEND block.)

One way to approach EXTEND is to run the file through Camlp4 (camlp4of has the required syntax extension) to see what we get. This is fun, but the result does not yield much insight; it’s just a simple transformation of the input, passed to Gram.extend. This is the entry point to a pretty hairy bunch of code that generates a recursive descent parser from the value representing the grammar. Let’s take a different tack: RTFM, then run some experiments to shine light in places where the fine manual is a bit dim.

First, what language is parsed by the grammar above? It looks pretty similar to the stream parser example. But what is LIDENT? The stream parser example works with a stream of strings. Here we are working with a stream of tokens, produced by the Lexer module; there is a variant defining the token types in PreCast.mli. The default lexer is OCaml-specific (but it’s often good enough for other purposes); a LIDENT is an OCaml lowercase identifier. A literal string (like "foo") indicates a KEYWORD token; using it in a grammar registers the keyword with the lexer. So the grammar can parse strings like foo quux bar or baz foo quux bar, but not foo bar bar, since bar is a KEYWORD not a LIDENT.

Most tokens have associated strings; x = LIDENT puts the associated string in x. Keywords are given in double quotes (x = KEYWORD works, but I can’t think of a good use for it). You can also use pattern-matching syntax (e.g. `LIDENT x) to get at the actual token constructor, which may carry more than just a string.

You can try the example and see that the lexer takes care of whitespace and OCaml comments. You’ll also notice that the parser ignores extra tokens after a successful parse; to avoid it we need an EOI token to indicate the end of the input (but I haven’t bothered here).

Left-factoring

What happens if two rules start with the same token?

EXTEND Gram 
  expr: 
    [[ 
       "foo"; "bar" -> "foo+bar" 
     | "foo"; "baz" -> "foo+baz" 
     ]]; 
END

If this were a stream parser, the first arm would always match when the next token is foo; if the subsequent token is baz then the parse fails. But with a grammar, the rules (arms, for a grammar) are left-factored: when there is a common prefix of symbols (a symbol is a keyword, token, or entry—and we will see some others later) among different rules, the parser doesn’t choose which rule to use until the common prefix has been parsed. You can think of a factored grammar as a tree, where the nodes are symbols and the leaves are actions (the right-hand side of a rule is the rule’s action); when a symbol distinguishes two rules, that’s a branching point. (In fact, this is how grammars are implemented: first the corresponding tree is generated, then the parser is generated from the tree.)

What if one rule is a prefix of another?

EXTEND Gram 
  expr: 
    [[ 
       "foo"; "bar" -> "foo+bar" 
     | "foo"; "bar"; "baz" -> "foo+bar+baz" 
     ]]; 
END

In this case the parser is greedy: if the next token is baz, it uses the second rule, otherwise the first. To put it another way, a token or keyword is preferred over epsilon, the empty string (and this holds for other ways that a grammar can match epsilon—see below about special symbols).

What if two rules call the same entry?

EXTEND Gram 
  GLOBAL: expr; 
 
  f: [[ "quux" ]]; 
 
  expr: 
    [[ 
       "foo"; f; "bar" -> "foo+bar" 
     | "foo"; f; "baz" -> "foo+baz" 
     ]]; 
END

First, what is this GLOBAL? By default, all entries are global, meaning that they must be pre-defined with Gram.Entry.mk. The GLOBAL declaration gives a list of entries which are global, and makes the rest local, so we don’t need to pre-define them, but we can’t refer to them outside the grammar. Second, note that we can call entries without binding the result to a variable, and that rules don’t need an action—in that case they return (). You can try it and see that factoring works on entries too. Maybe this is slightly surprising, if you’re thinking about the rules as parse-time alternatives, but factoring happens when the parser is built.

What about an entry vs. a token?

EXTEND Gram 
  GLOBAL: expr; 
 
  f: [[ "baz" ]]; 
 
  expr: 
    [[ 
       "foo"; "bar"; f -> "foo+bar" 
     | "foo"; "bar"; "baz" -> "foo+bar+baz" 
     ]]; 
END

Both rules parse the same language, but an explicit token or keyword trumps an entry or other symbol, so the second rule is used. You can try it and see that the order of the rules doesn’t matter.

What about two different entries?

EXTEND Gram 
  GLOBAL: expr; 
 
  f1: [[ "quux" ]]; 
  f2: [[ "quux" ]]; 
 
  expr: 
    [[ 
       "foo"; f1; "bar" -> "foo+bar" 
     | "foo"; f2; "baz" -> "foo+baz" 
     ]]; 
END

Factoring happens only within a rule, so the parser doesn’t know that f1 and f2 parse the same language. It commits to the first rule after parsing foo; if after parsing quux it then sees baz, it doesn’t backtrack and try the second rule, so the parse fails. If you switch the order of the rules, then baz succeeds but bar fails.

Local backtracking

Why have two identical entries in the previous example? If we make them different, something a little surprising happens:

EXTEND Gram 
  GLOBAL: expr; 
 
  f1: [[ "quux" ]]; 
  f2: [[ "xyzzy" ]]; 
 
  expr: 
    [[ 
       "foo"; f1; "bar" -> "foo+bar" 
     | "foo"; f2; "baz" -> "foo+baz" 
     ]]; 
END

Now we can parse both foo quux bar and foo xyzzy baz. How does this work? It takes a little digging into the implementation (which I will spare you) to see what’s happening: the "foo" keyword is factored into a common prefix, then we have a choice between f1 and f2. A choice betwen entries generates a stream parser, with an arm for each entry which calls the entry’s parser. As we saw in the stream parsers sections, calling another parser in the first position of a match compiles to a limited form of backtracking. So in the example, if f1 fails with Stream.Failure (which it does when the next token is not quux) then the parser tries to parse f2 instead.

Local backtracking works only when the parser is at a branch point (e.g. a choice between two entries), and when the called entry does not itself commit and advance the stream (in which case Stream.Error is raised on a parse error instead of Stream.Failure). Here is an example that fails the first criterion:

EXTEND Gram 
  GLOBAL: expr; 
 
  f1: [[ "quux" ]]; 
  f2: [[ "xyzzy" ]]; 
  g1: [[ "plugh" ]]; 
  g2: [[ "plugh" ]]; 
 
  expr: 
    [[ 
       g1; f1 -> "f1" 
     | g2; f2 -> "f2" 
     ]]; 
END

After parsing g1, the parser has committed to the first rule, so it’s not possible to backtrack and try the second if f1 fails.

Here’s an example that fails the second criterion:

EXTEND Gram 
  GLOBAL: expr; 
 
  g: [[ "plugh" ]]; 
  f1: [[ g; "quux" ]]; 
  f2: [[ g; "xyzzy" ]]; 
 
  expr: 
    [[ f1 -> "f1" | f2 -> "f2" ]]; 
END

When f1 is called, after parsing g the parser is committed to f1, so if the next token is not quux the parse fails rather than backtracking.

Local backtracking can be used to control parsing with explicit lookahead. We could repair the previous example as follows:

let test = 
  Gram.Entry.of_parser "test" 
    (fun strm -> 
       match Stream.npeek 2 strm with 
         | [ _; KEYWORD "xyzzy", _ ] -> raise Stream.Failure 
         | _ -> ()) 
EXTEND Gram 
  GLOBAL: expr; 
 
  g: [[ "plugh" ]]; 
  f1: [[ g; "quux" ]]; 
  f2: [[ g; "xyzzy" ]]; 
 
  expr: 
    [[ test; f1 -> "f1" | f2 -> "f2" ]]; 
END

We create an entry from a stream parser with Gram.Entry.of_parser. This could do some useful parsing and return a value just like any other entry, but here we just want to cause a backtrack (by raising Stream.Failure) if the token after the next one is xyzzy. We can see it with Stream.npeek 2, which returns the next two tokens, but does not advance the stream. (The stream parser syntax is not useful here since it advances the stream on a match.) You can see several examples of this technique in Camlp4OCamlParser.ml.

We have seen that for stream parsers, a match of a sequence of literals compiles to a nested pattern match; as soon as the first literal matches, we’re committed to that arm. With grammars, however, a sequence of tokens (or keywords) is matched all at once: enough tokens are peeked; if all match then the stream is advanced past all of them; if any fail to match, Stream.Failure is raised. So in the first example of this section, f1 could be any sequence of tokens, and local backtracking would still work. Or it could be a sequence of tokens followed by some non-tokens; as long as the failure happens in the sequence of tokens, local backtracking would still work.

Self-calls

Consider the following grammar:

EXTEND Gram 
  GLOBAL: expr; 
 
  b: [[ "b" ]]; 
 
  expr: 
    [[ expr; "a" -> "a" | b -> "b" ]]; 
END

We’ve seen that a choice of entries generates a stream parser with an arm for each entry, and also that a call to another parser in a stream parser match is just a function call. So it seems like the parser should go into a loop before parsing anything.

However, Camlp4 gives calls to the entry being defined (“self-calls”) special treatment. The rules of an entry actually generate two parsers, the “start” and “continue” parsers (these names are taken from the code). When a self-call appears as the first symbol of a rule, the rest of the rule goes into the continue parser; otherwise the whole rule goes into the start parser. An entry is parsed starting with the start parser; a successful parse is followed by the continue parser. So in the example, we first parse using just the second rule, to get things off the ground, then parse using just the first rule. If there are no start rules (that is, all rules begin with self-calls) the parser doesn’t loop, but it fails without parsing anything.

Levels and precedence

I am sorry to say that I have not been completely honest with you. I have made it seem like entries consist of a list of rules in double square brackets. In fact, entries are lists of levels, in single square brackets, and each level consists of a list of rules, also in single square brackets. So each of the examples so far has contained only a single level. Here is an example with multiple levels:

EXTEND Gram 
  expr: 
    [ [ x = expr; "+"; y = expr -> x + y 
      | x = expr; "-"; y = expr -> x - y ] 
    | [ x = expr; "*"; y = expr -> x * y 
      | x = expr; "/"; y = expr -> x / y ] 
    | [ x = INT -> int_of_string x 
      | "("; e = expr; ")" -> e ] ]; 
END

(You’ll need a string_of_int to use this grammar with the earlier framework.) The idea with levels is that parsing begins at the topmost level; if no rule applies in the current level, then the next level down is tried. Furthermore, when making a self-call, call at the current level (or the following level; see below) rather than at the top. This gives a way to implement operator precedence: order the operators top to bottom from loosest- to tightest-binding.

Why does this work? The multi-level grammar is just a “stratified” grammar, with a little extra support from Camlp4; we could write it manually like this:

EXTEND Gram 
  GLOBAL: expr; 
 
  add_expr: 
    [[ 
       x = add_expr; "+"; y = mul_expr -> x + y 
     | x = add_expr; "-"; y = mul_expr -> x - y 
     | x = mul_expr -> x 
     ]]; 
 
  mul_expr: 
    [[ 
       x = mul_expr; "*"; y = base_expr -> x * y 
     | x = mul_expr; "/"; y = base_expr -> x / y 
     | x = base_expr -> x 
     ]]; 
 
  base_expr: 
    [[ 
       x = INT -> int_of_string x 
     | "("; e = add_expr; ")" -> e 
     ]]; 
 
  expr: [[ add_expr ]]; 
END

When parsing a mul_expr, for instance, we don’t want to parse an add_expr as a subexpression; 1 * 2 + 3 should not parse as 1 * (2 + 3). A stratified grammar just leaves out the rules for lower-precedence operators at each level. Why do we call add_expr on the left side of + but mul_expr on the right? This makes + left-associative; we parse 1 + 2 + 3 as (1 + 2) + 3 since add_expr is a possibility only on the left. (For an ordinary recursive-descent parser we’d want right-associativity to prevent looping, although the special treatment of self-calls makes the left-associative version work here.)

Associativity works just the same with the multi-level grammar. By default, levels are left-associative: in the start parser (for a self-call as the first symbol of the rule), the self-call is made at the same level; in the continue parser, self-calls are made at the following level. For right-associativity it’s the reverse, and for non-associativity both start and continue parsers call the following level. The associativity of a level can be specified by prefixing it with the keywords NONA, LEFTA, or RIGHTA. (Either I don’t understand what non-associativity means, or NONA is broken; it seems to be the same as LEFTA.)

Levels may be labelled, and the level to call may be given explicitly. So another way to write the same grammar is:

EXTEND Gram 
  expr: 
    [ "add" 
      [ x = expr LEVEL "mul"; "+"; y = expr LEVEL "add" -> x + y 
      | x = expr LEVEL "mul"; "-"; y = expr LEVEL "add" -> x - y 
      | x = expr LEVEL "mul" -> x ] 
    | "mul" 
      [ x = expr LEVEL "base"; "*"; y = expr LEVEL "mul" -> x * y 
      | x = expr LEVEL "base"; "/"; y = expr LEVEL "mul" -> x / y 
      | x = expr LEVEL "base" -> x ] 
    | "base" 
      [ x = INT -> int_of_string x 
      | "["; e = expr; "]" -> e ] ]; 
END

(Unfortunately, the left-associative version of this loops; explicitly specifying a level when calling an entry defeats the start / continue mechanism, since the call is not recognized as a self-call.) Calls to explicit levels can be used when calling other entries, too, not just for self calls. Level names are also useful for extending grammars, although we won’t cover that here.

Special symbols

There are several special symbols: SELF refers to the entry being defined (at the current or following level depending on the associativity and the position of the symbol in the rule, as above); NEXT refers to the entry being defined, at the following level regardless of associativity or position.

A list of zero or more items can be parsed with the syntax LIST0 elem, where elem can be any other symbol. The return value has type 'a list when elem has type 'a. To parse separators between the elements use LIST0 elem SEP sep; again sep can be any other symbol. LIST1 means parse one or more items. An optional item can be parsed with OPT elem; the return value has type 'a option. (Both LIST0 and OPT can match the empty string; see the note above about the treatment of epsilon.)

Finally, a nested set of rules may appear in a rule, and acts like an anonymous entry (but can have only one level). For example, the rule

  x = expr; ["+" | "plus"]; y = expr -> x + y

parses both 1 + 2 and 1 plus 2.

Addendum: A new special symbol appeared in the 3.12.0 release, TRY elem, which provides non-local backtracking: a Stream.Error occurring in elem is converted to a Stream.Failure. (It works by running elem on an on-demand copy of the token stream; tokens are not consumed from the real token stream until elem succeeds.) TRY replaces most (all?) cases where you’d need to drop down to a stream parser for lookahead. So another way to fix the local backtracking example above is:

EXTEND Gram 
  GLOBAL: expr; 
 
  g: [[ "plugh" ]]; 
  f1: [[ g; "quux" ]]; 
  f2: [[ g; "xyzzy" ]]; 
 
  expr: 
    [[ TRY f1 -> "f1" | f2 -> "f2" ]]; 
END

Almost the whole point of Camlp4 grammars is that they are extensible—you can add rules and levels to entries after the fact—so you can modify the OCaml parsers to make syntax extensions. But I am going to save that for a later post.

How froc works

2010-05-07T10:47:00.001-07:00

I am happy to announce the release of version 0.2 of the froc library for functional reactive programming in OCaml. There are a number of improvements:

better event model: there is now a notion of simultaneous events, and behaviors and events can now be freely mixed
self-adjusting computation is now supported via memo functions; needless recomputation can be avoided in some cases
faster priority queue and timeline data structures
behavior and event types split into co- and contra-variant views for subtyping
bug fixes and cleanup

Development of froc has moved from Google Code to Github; see

Thanks to Ruy Ley-Wild for helpful discussion, and to Daniel Bünzli for helpful discussion and many good ideas in React.

I thought I would take this opportunity to explain how froc works, because it is interesting, and to help putative froc users use it effectively.

Dependency graphs

The main idea behind froc (and self-adjusting computation) is that we can think of an expression as implying a dependency graph, where each subexpression depends on its subexpressions, and ultimately on some input values. When the input values change, we can recompute the expression incrementally by recursively pushing changes to dependent subexpressions.

To be concrete, suppose we have this expression:

  let u = v / w + x * y + z

Here is a dependency graph relating expressions to their subexpressions:

The edges aren’t directed, because we can think of dependencies as either demand-driven (to compute A, we need B), or change-driven (when B changes, we must recompute A).

Now suppose we do an initial evaluation of the expression with v = 4, w = 2, x = 2, y = 3, and z = 1. Then we have (giving labels to unlabelled nodes, and coloring the current value of each node green):

If we set z = 2, we need only update u to 10, since no other node depends on z. If we then set v = 6, we need to update n0 to 3, n2 to 9 (since n2 depends on n0), and u to 11, but we don’t need to update n1. (This is the change-driven point of view.)

What if we set z = 2 and v = 6 simultaneously, then do the updates? We have to be careful to do them in the right order. If we updated u first (since it depends on z), we’d use a stale value for n2. We could require that we don’t update an expression until each of its dependencies has been updated (if necessary). Or we could respect the original evaluation order of the expressions, and say that we won’t update an expression until each expression that came before it has been updated.

In froc we take the second approach. Each expression is given a timestamp (not a wall-clock time, but an abstract ordered value) when it’s initially evaluated, and we re-evaluate the computation by running through a priority queue of stale expressions, ordered by timestamp. Here is the situation, with changed values in magenta, stale values in red, and timestamps in gray:

If we update the stale nodes from their dependencies in timestamp order, we get the right answer. We will see how this approach gives us a way to handle control dependencies, where A does not depend on B, but A’s execution is controlled by B.

Library interface

The core of froc has the following (simplified) signature:

  type 'a t 
  val return : 'a -> 'a t 
  val bind : 'a t -> ('a -> 'b t) -> 'b t

The type 'a t represents changeable values (or just changeables) of type 'a; these are the nodes of the dependency graph. Return converts a regular value to a changeable value. Bind makes a new changeable as a dependent of an existing one; the function argument is the expression that computes the value from its dependency. We have >>= as an infix synonym for bind; there are also multi-argument versions (bind2, bind3, etc.) so a value can depend on more than one other value.

We could translate the expression from the previous section as:

  let n0 = bind2 v w (fun v w -> return (v / w)) 
  let n1 = bind2 x y (fun x y -> return (x * y)) 
  let n2 = bind2 n0 n1 (fun n0 n1 -> return (n0 + n1)) 
  let u = bind2 n2 z (fun n2 z -> return (n2 + z))

There are some convenience functions in froc to make this more readable (these versions are also more efficient):

  val blift : 'a t -> ('a -> 'b) -> 'b t 
  val lift : ('a -> 'b) -> 'a t -> 'b t

Blift is like bind except that you don’t need the return at the end of the expression (below we’ll see cases where you actually need bind); lift is the same as blift but with the arguments swapped for partial application. So we could say

  let n0 = blift2 v w (fun v w -> v / w) 
  let n1 = blift2 x y (fun x y -> x * y) 
  let n2 = blift2 n0 n1 (fun n0 n1 -> n0 + n1) 
  let u = blift2 n2 z (fun n2 z -> n2 + z)

or even

  let (/) = lift2 (/) 
  let ( * ) = lift2 ( * ) 
  let (+) = lift2 (+) 
  let u = v / w + x * y + z

Now, there is no reason to break down expressions all the way—a node can have a more complicated expression, for example:

  let n0 = blift2 v w (fun v w -> v / w) 
  let n2 = blift3 n0 x y (fun n0 x y -> n0 + x * y) 
  let u = blift2 n2 z (fun n2 z -> n2 + z)

There is time overhead in propagating dependencies, and space overhead in storing the dependency graph, so it’s useful to be able to control the granularity of recomputation by trading off computation over changeable values with computation over ordinary values.

Dynamic dependency graphs

Take this expression:

  let b = x = 0 
  let y = if b then 0 else 100 / x

Here it is in froc form:

  let b = x >>= fun x -> return (x = 0) 
  let n0 = x >>= fun x -> return (100 / x) 
  let y = bind2 b n0 (fun b n0 -> if b then return 0 else n0)

and its dependency graph, with timestamps:

(We begin to see why bind is sometimes necessary instead of blift—in order to return n0 in the else branch, the function must return 'b t rather than 'b.)

Suppose we have an initial evaluation with x = 10, and we then set x = 0. If we blindly update n0, we get a Division_by_zero exception, although we get no such exception from the original code. Somehow we need to take into account the control dependency between b and 100 / x, and compute 100 / x only when b is false. This can be accomplished by putting it inside the else branch:

  let b = x >>= fun x -> return (x = 0) 
  let y = b >>= fun b -> if b then return 0 
                              else x >>= fun x -> return (100 / x)

How does this work? Froc keeps track of the start and finish timestamps when running an expression, and associates dependencies with the timestamp when they are attacheed. When an expression is re-run, we detach all the dependencies between the start and finish timestamps. In this case, when b changes, we detach the dependent expression that divides by 0 before trying to run it.

Let’s walk through the initial run with x = 10: Here is the graph showing the timestamp ranges, and on the dependency edges, the timestamp when the dependency was attached:

First we evaluate b (attaching it as a dependent of x at time 0) to get false. Then we evaluate y (attaching it as a dependent of b at time 3): we check b and evaluate n0 to get 10 (attaching it as a dependent of x at time 5). Notice that we have a dependency edge from y to n0. This is not a true dependency, since we don’t recompute y when n0 changes; rather the value of y is a proxy for n0, so when n0 changes we just forward the new value to y.

What happens if we set x = 20? Both b and n0 are stale since they depend on x. We re-run expressions in order of their start timestamp, so we run b and get false. Since the value of b has not changed, y is not stale. Then we re-run n0, so its value (and the value of y by proxy) becomes 5.

What happens if we set x = 0? We run b and get true. Now y is also stale, and it is next in timestamp order. We first detach all the dependencies in the timestamp range 4-9 from the previous run of y: the dependency of n0 on x and the proxy dependency of y on n0. This time we take the then branch, so we get 0 without attaching any new dependencies. We are done; no Division_by_zero exception.

Now we can see why it’s important to handle updates in timestamp order: the value which decides a control flow point (e.g. the test of an if) is always evaluated before the control branches (the then and else branches), so we have the chance to fix up the dependency graph before the branches are updated.

Garbage collection and cleanup functions

A node points to its dependencies (so it can read their values when computing its value), and its dependencies point back to the node (so they can mark it stale when they change). This creates a problem for garbage collection: a node which becomes garbage (from the point of view of the library user) is still attached to its dependencies, taking up memory, and causing extra recomputation.

The implementation of dynamic dependency graphs helps with this problem: as we have seen, when an expression is re-run, the dependencies attached in the course of the previous run are detached, including any dependencies for nodes which have become garbage. Still, until the expression that created them is re-run, garbage nodes remain attached.

Some other FRP implementations use weak pointers to store a node’s dependents, to avoid hanging on to garbage nodes. Since froc is designed to work in browsers (using ocamljs), weak pointers aren’t an option because they aren’t supported in Javascript. But even in regular OCaml, there are reasons to eschew the use of weak pointers:

First, it’s useful to be able to set up changeable expressions which are used for their effect (say, updating the GUI) rather than their value; to do this with a system using weak pointers, you have to stash the expression somewhere so it won’t be GC’d. This is similar to the problem of GCing threads; it doesn’t make sense if the threads can have an effect.

Second, there are other resources which may need to be cleaned up in reaction to changes (say, GUI event handler registrations); weak pointers are no help here. Froc gives you a way to set cleanup functions during a computation, which are run when the computation is re-run, so you can clean up other resources.

With froc there are two options to be sure you don’t leak memory: you can call init to clean up the entire system, or you can use bind to control the lifetime of changeables: for instance, you could have a changeable c representing a counter, do a computation in the scope of a bind of c (you can just ignore the value), then increment the counter to clear out the previous computation.

In fact, there are situations where froc cleans up too quickly—when you want to hang on to a changeable after the expression that attached it is re-run. We’ll see shortly how to avoid this.

Memoizing the previous run

Here is the List.map function, translated to work over lists where the tail is changeable.

  type 'a lst = Nil | Cons of 'a * 'a lst t 
 
  let rec map f lst = 
    lst >>= function 
      | Nil -> return Nil 
      | Cons (h, t) -> 
          let t = map f t in 
          return (Cons (f h, t))

What happens if we run

  map (fun x -> x + 1) [ 1; 2; 3 ]

? (I’m abusing the list syntax here to mean a changeable list with these elements.) Let’s see if we can fit the dependency graph on the page (abbreviating Cons and Nil and writing just f for the function expression):

(The dependency edges on the right-hand side don’t mean that e.g. f0 depends directly on f1, but rather that the value returned by f0—Cons(2,f1)—depends on f1. We don’t re-run f0 when f1 changes, or even update its value by proxy as we did in the previous section. But if f1 is stale it must be updated before we can consider f0 up-to-date.)

Notice how the timestamp ranges for the function expressions are nested each within the previous one. There is a control dependency at each recursive call: whether we make a deeper call depends on whether the argument list is Nil.

So if we change t3, just f3 is stale. But if we change t0, we must re-run f0, f1, f2, and f3—that is, the whole computation—detaching all the dependencies, then reattaching them. This is kind of annoying; we do a lot of pointless work since nothing after the first element has changed.

If only some prefix of the list has changed, we’d like to be able to reuse the work we did in the previous run for the unchanged suffix. Froc addresses this need with memo functions. In a way similar to ordinary memoization, a memo function records a table of arguments and values when you call it. But in froc we only reuse values from the previous run, and only those from the timestamp range we’re re-running. We can define map as a memo function:

  let map f lst = 
    let memo = memo () in 
    let rec map lst = 
      lst >>= function 
        | Nil -> return Nil 
        | Cons (h, t) -> 
            let t = memo map t in 
            return (Cons (f h, t)) in 
    memo map lst

Here the memo call makes a new memo table. In the initial run we add a memo entry associating each list node (t0, t1, …) with its map (f0, f1, …). Now, suppose we change t0: f0 is stale, so we update it. When we go to compute map f t1 we get a memo hit returning f1 (the computation of f1 is contained in the timestamp range of f0, so it is a candidate for memo matching). F1 is up-to-date so we return it as the value of map f t1.

There is a further wrinkle: suppose we change both t0 and t2, leaving t1 unchanged. As before, we get a memo hit on t1 returning f1, but since f2 is stale, so is f1. We must run the update queue until f1 is up-to-date before we return it as the value of map f t1. Recall that we detach the dependencies of the computation we’re re-running; in order to update f1 we just leave it attached to its dependencies and run the queue until the end of its timestamp range.

In general, there can be a complicated pattern of changed and unchanged data—we could change every other element in the list, for instance—so memoization and the update loop call one another recursively. From the timestamp point of view, however, we can think of it as a linear scan through time, alternating between updating stale computations and reusing ones which have not changed.

The memo function mechanism provides a way to keep changeables attached even after the expression that attached them is re-run. We just need to attach them from within a memo function, then look them up again on the next run, so they’re left attached to their dependencies. The quickhull example (source) demonstrates how this works.

Functional reactive programming and the event queue

Functional reactive programming works with two related types: behaviors are values that can change over time, but are defined at all times; events are defined only at particular instants in time, possibly (but not necessarily) with a different value at each instant. (Signals are events or behaviors when we don’t care which one.)

Events can be used to represent external events entering the system (like GUI clicks or keystrokes), and can also represent occurrences within the system, such as a collision between two moving objects. It is natural for events to be defined in terms of behaviors and vice versa. (In fact they can be directly interdefined with the hold and changes functions.)

In froc, behaviors are just another name for changeables. Events are implemented on top of changeables: they are just changeables with transient values. An incoming event sets the value of its underlying changeable; after changes have propagated through the dependency graph, the values of all the changeables which underlie events are removed (so they can be garbage collected).

Signals may be defined (mutually) recursively. For example, in the bounce example (source), the position of the ball is a behavior defined in terms of its velocity, which is a behavior defined in terms of events indicating collisions with the walls and paddle, which are defined in terms of the ball’s position.

Froc provides the fix_b and fix_e functions to define signals recursively. The definition of a signal can’t refer directly to its own current value, since it hasn’t been determined yet; instead it sees its value from the previous update cycle. When a recursively-defined signal produces a value, an event is queued to be processed in the next update cycle, so the signal can be updated based on its new current value. (If the signal doesn’t converge somehow this process loops.)

Related systems

Froc is closely related to a few other FRP systems which are change-driven and written in an imperative, call-by-value language:

FrTime is an FRP system for PLT Scheme. FrTime has a dependency graph and update queue mechanism similar to froc, but sorts stale nodes in dependency (“topological”) rather than timestamp order. There is a separate mechanism for handling control dependencies, using a dynamic scoping feature specific to PLT Scheme (“parameters”) to track dependencies attached in the course of evaluating an expression; in addition FrTime uses weak pointers to collect garbage nodes. There is no equivalent of froc’s memo functions. Reactivity in FrTime is implicit: you give an ordinary Scheme program, and the compiler turns each subexpression into a changeable value. There is no programmer control over the granularity of recomputation, but there is a compiler optimization (“lowering”) which recovers some performance by coalescing changeables.

Flapjax is a descendent of FrTime for Javascript. It implements the same dependency-ordered queue as FrTime, but there is no mechanism for control dependencies, and there are no weak pointers (since there are none in Javascript), so it is fairly easy to create memory leaks (although there is a special reference-counting mechanism in certain cases). Flapjax can be used as a library; it also has a compiler similar to FrTime’s, but since it doesn’t handle control dependencies, the semantics of compiled programs are not preserved (e.g. you can observe exceptions that don’t occur in the original program).

React is a library for OCaml, also based on a dependency-ordered queue, using weak pointers, without a mechanism for control dependencies.

Colophon

I used Mlpost to generate the dependency graph diagrams. It is very nice!

orpc 0.3

2010-04-02T19:18:00.000-07:00

I am happy to announce version 0.3 of orpc, a tool for generating RPC bindings from OCaml signatures. Orpc can generate ONC RPC stubs for use with Ocamlnet (in place of ocamlrpcgen), and it can also generate RPC over HTTP stubs for use with ocamljs. You can use most OCaml types in interfaces, as well as labelled and optional arguments.

Changes since version 0.2 include

a way to use types defined outside the interface file, so you can use a type in more than one interface
support for polymorphic variants
a way to specify “abstract” interfaces that can be instantiated for synchronous, asynchronous, and Lwt clients and servers
bug fixes

Development of orpc has moved from Google Code to Github; see

Let me know what you think.

Updated backtrace patch

2010-03-27T18:19:00.001-07:00

I’ve updated my backtrace patch to work with OCaml 3.11.x as well as 3.10.x. The patch provides

access to backtraces from within a program (this is already provided in stock 3.11.x)
backtraces for dynamically-loaded bytecode
backtraces in the (bytecode) toplevel

In addition there are a few improvements since the last version:

debugging events are allocated outside the heap, so memory use should be better with forking (on Linux at least, the data is shared on copy-on-write pages but the first GC causes the pages be copied)
fixed a bug that could cause spurious “unknown location” lines in the backtrace
a script to apply the patch (instead of the previous multi-step manual process)

See ocaml-backtrace-patch on Github or download the tarball.

Inside OCaml objects

2010-03-23T15:32:00.000-07:00

In the ocamljs project I wanted to implement the OCaml object system in a way that is interoperable with Javascript objects. Mainly I wanted to be able to call Javascript methods with the OCaml method call syntax, but it is also useful to write objects in OCaml which are callable in the usual way from Javascript.

I spent some time a few months ago figuring out how OCaml objects are put together in order to implement this (it is in the unreleased ocamljs trunk—new release coming soon I hope). I got a bug report against it the other day, and it turns out I don’t remember much of what I figured out. So I am going to figure it out again, and write it down, here in this very blog post!

Objects are implemented mostly in the CamlinternalOO library module, with a few compiler primitives for method invocation. The compiler generates CamlinternalOO calls to construct classes and objects. Our main tool for figuring out what is going on is to write a test program, dump out its lambda code with -dlambda, and read the CamlinternalOO source to see what it means. I will explain functions from camlinternalOO.ml but not embed them in the post, so you may want it available for reference.

I have hand-translated (apologies for any errors) the lambda code back to pseudo-OCaml to make it more readable. The compiler-generated code works directly with the OCaml heap representation, and generally doesn’t fit into the OCaml type system. Where the heap representation can be translated back to an OCaml value I do that; otherwise I write blocks with array notation, and atoms with integers. Finally I have used OO as an abbreviation for CamlinternalOO.

Immediate objects

Here is a first test program, defining an immediate object:

let p = 
object 
  val mutable x = 0 
  method get_x = x 
  method move d = x <- x + d 
end

And this is what it compiles to:

let shared = [|"move";"get_x"|] 
let p = 
  let clas = OO.create_table shared in 
  let obj_init = 
    let ids = OO.new_methods_variables clas shared [|"x"|] in 
    let move = ids.(0) in 
    let get_x = ids.(1) in 
    let x = ids.(2) in 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      move; (fun self d -> self.(x) <- self.(x) + d); 
    |]; 
    (fun env -> 
       let self = OO.create_object_opt 0 clas in 
       self.(x) <- 0; 
       self) in 
  OO.init_class clas; 
  obj_init 0

An object has a class, created with create_table and filled in with new_methods_variables, set_methods, and init_class; the object itself is created by calling create_object_opt with the class as argument, then initializing the instance variable.

A table (the value representing a class) has the following fields (and some others we won’t cover):

type table = { 
  mutable size: int; 
  mutable methods: closure array; 
  mutable methods_by_name: meths; 
  mutable methods_by_label: labs; 
  mutable vars: vars; 
}

Each instance variable has a slot (its index in the block which represents the object); vars maps variable names to slots. The size field records the total number of slots (including internal slots, see below).

Each public method has a label, computed by hashing the method name. The methods field (used for method dispatch) holds each method of the class, with the label of the method at the following index (the type is misleading). Each method then has a slot (the index in methods of the method function); methods_by_name maps method names to slots, and the confusingly-named methods_by_label marks slots to whether it is occupied by a public method.

The create_table call assigns slots to methods, fills in the method labels in methods, and sets up methods_by_name and methods_by_label. The new_methods_variables call returns the slot of each public method and each instance variable in a block (which is unpacked into local variables).

The set_methods call sets up the method functions in methods. Its argument is a block containing alternating method slots and method descriptions (the description can take more than one item in the block). For some methods (e.g. move) the description is just an OCaml function (here you can see that self is passed as the first argument). For some the description is given by a value of the variant OO.impl along with some other arguments. For get_x it is GetVar followed by the slot for x. The actual function that gets the instance variable is generated by set_methods. As far as I understand it, the point of this is to reduce object code size by factoring out the common code from frequently occurring methods.

Finally create_object_opt allocates a block of clas.size, then fills in the first slot with the methods array of the class and the second with the object’s unique ID. (We will see below what the _opt part is about.)

Method calls

A public method call:

p#get_x

compiles to:

send p 291546447

where send is a built-in lambda term. The number is the method label. To understand how the method is applied we have to go a little deeper. In bytegen.ml there is a case for Lsend which generates the Kgetpubmet bytecode instruction to find the method function; the function is then applied like any other function. Next we look to the GETPUBMET case in interp.c to see how public methods are looked up in the methods block (stored in the first field of the object).

A couple details about methods we didn’t cover before: The first field contains the number of public methods. The second contains a bitmask used for method caching—briefly, it is enough bits to store offsets into methods. The rest of the block is method functions and labels as above, padded out so that the range of an offset masked by the bitmask does not overflow the block.

Returning to GETPUBMET, we first check to see if the method cache for this call site is valid. The method cache is an extra word at each call site which stores an offset into methods (but may be garbage—masking it takes care of this). If the method label at this offset matches the label we’re looking for, the associated method function is returned. Otherwise, we binary search methods to find the method label (methods are sorted in label order in transclass.ml), then store the offset in the cache and return the associated method function.

Classes

A class definition:

class point = 
object 
  val mutable x = 0 
  method get_x = x 
  method move d = x <- x + d 
end 
let p = new point

compiles to:

let shared = [|"move";"get_x"|] 
let point = 
  let point_init clas = 
    let ids = OO.new_methods_variables clas shared [|"x"|] in 
    let move = ids.(0) in 
    let get_x = ids.(1) in 
    let x = ids.(2) in 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      move; (fun self d -> self.(x) <- self.(x) + d); 
    |]; 
    (fun env self -> 
       let self = OO.create_object_opt self clas in 
       self.(x) <- 0; 
       self) in 
  OO.make_class shared point_init 
let p = (point.(0) 0)

This is similar to the immediate object code, except that the class constructor takes the class table as an argument rather than constructing it itself, and the object constructor takes self as an argument. We will see that class and object constructors are each chained up the inheritance hierarchy, and the tables / objects are passed up the chain. The make_class call calls create_table and init_class in the same way we saw in the immediate object case, and returns a tuple, of which the first component is the object constructor. So the new invocation calls the constructor.

Inheritance

A subclass definition:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point 
  val mutable y = 0 
  method get_y = y 
end

compiles to:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = 
    let ids = 
      OO.new_methods_variables clas [|"get_y"|] [|"y"|] in 
    let get_y = ids.(0) in 
    let y = ids.(1) in 
    let inh = 
      OO.inherits 
        clas [|"x"|] [||] [|"get_x";"move"|] point true in 
    let obj_init = inh.(0) in 
    OO.set_methods clas [| get_y; GetVar; y |]; 
    (fun env self -> 
      let self' = OO.create_object_opt self clas in 
      obj_init self'; 
      self'.(y) <- 0; 
      OO.run_initializers_opt self self' clas) in 
  OO.make_class [|"move";"get_x";"get_y"|] point_sub_init

The subclass is connected to its superclass through inherits, which calls the superclass constructor on the subclass (filling in methods with the superclass methods) and returns the superclass object constructor (and some other stuff). In the subclass object constructor, the superclass object constructor is called. (This is why the object is created optionally—the class on which new is invoked actually allocates the object; further superclass constructors just initialize instance variables.) In addition, we run any initializers, since some superclass may have them.

Self- and super-calls

A class with a self-call:

class point = 
object (s) 
  val mutable x = 0 
  method get_x = x 
  method get_x5 = s#get_x + 5 
end

becomes:

let point = 
  let point_init clas = 
    ... (* as before *) 
    OO.set_methods clas [| 
      get_x; OO.GetVar; x; 
      get_x5; (fun self -> sendself self get_x + 5); 
    |] 
    ...

Here sendself is a form of Lsend for self-calls, where we know the method slot at compile time. Instead of generating the Kgetpubmet bytecode, it generates Kgetmethod, which just does an array reference to find the method.

A class with a super-call:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point as super 
  method move1 n = super#move n 
end

becomes:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = 
    ... 
    let inh = 
      OO.inherits 
        clas [|"x"|] [||] [|"get_x";"move"|] point true in 
    let move = inh.(3) in 
    ... 
    OO.set_methods clas [| 
      move1; (fun self n -> move self n) 
    |]; 
    ...

In this case, we are able to look up the actual function for the super-call in the class constructor (returned from inherits), so the invocation is just a function application rather than a slot dereference.

I don’t totally understand why we don’t know the function for self calls. I think it is because the superclass constructor runs before the subclass constructor, so the slot is assigned (this happens before the class constructors are called) but the function hasn’t been filled in yet. Still it seems like the knot could somehow be tied at class construction time to avoid a runtime slot dereference.

ocamljs implementation

The main design goal is that we be able to call methods on ordinary Javascript objects with the OCaml method call syntax, simply by declaring a class type giving the signature of the object. So if you want to work with the browser DOM you can say:

class type document = 
object 
  method getElementById : string -> #element 
  ... (* and so on *) 
end

for some appropriate element type (see src/dom/dom.mli in ocamljs for a full definition), and say:

document#getElementById "content"

to make the call.

These are always public method calls, so they use the Lsend lambda form. We don’t want to do method label dispatch, since Javascript already has dispatch by name, so first off we need to carry the name rather than the label in Lsend.

We have seen how self is passed as the first argument when methods are invoked. We can’t do that for an arbitrary Javascript function, but the function might use this, so we need to be sure that this points to the object.

There is no way to know at compile time whether a particular method invocation is on a regular Javascript object or an OCaml object. Maybe we could mark OCaml objects somehow and do a check at runtime, but I decided to stick with a single calling convention. So whatever OCaml objects compile to, they have to support the convention for regular Javascript objects—foo#bar compiles to foo.bar, with this set to foo.

As we have seen, self-calls are compiled to a slot lookup rather than a name lookup, so we also need to support indexing into methods.

So here’s the design: an OCaml object is represented by a Javascript object, with numbered slots containing the instance variables. There is a constructor for each class, with prototype set up so each method is accessible by name, and the whole methods block is accessible in a special field, so we can call by slot. (Since we don’t need method labels, methods just holds functions.)

The calling convention passes self in this, so we bind a local self variable to this on entry to each method. It doesn’t work to say this everywhere instead of self, because this in Javascript is a bit fragile. In particular, if you define and apply a local function (ocamljs does this frequently), this is null rather than the lexically-visible this.

For sendself we look up the function by slot in the special methods field. Finally, for super-calls, we know the function at class construction time. In this case the function is applied directly, but we need to take care to treat it as a method application rather than an ordinary function call, since the calling convention is different.

The bug

The OCaml compiler turns super-calls into function applications very early in compilation (during typechecking in typecore.ml). There is no difference in calling convention for regular OCaml, so it doesn’t matter that later phases don’t know that these function applications are super-calls. But in our case we have to carry this information forward to the point where we generate Javascript (in jsgen.ml). It is a little tricky without changing the “typedtree” intermediate language.

I had put in a hack to mark these applications with a special extra argument, and it worked fine for my test program, where the method had no arguments. I didn’t think through or test the case where the method has arguments though. I was able to fix it (I think!) with a different hack: super calls are compiled to self calls (that is, to Texp_send with Tmeth_val) but the identifier in Tmeth_val is marked with an unused bit to indicate that it binds a function rather than a slot, so we don’t need to dereference it.

Appendix: other features

It is interesting to see how the various features of the object system are implemented, but maybe not that interesting, so here they are as an appendix.

Constructor parameters

A class definition with a constructor parameter:

class point x_init = 
object 
  val mutable x = x_init 
  ... (* as before *) 
end

compiles to:

let point = 
  let point_init clas = 
    ... (* as before *) 
    (fun env self x_init -> 
      OO.create_object_opt self clas; 
      self.(x) <- x_init; 
      self) in 
  ... (* as before *)

So the constructor parameter in the surface syntax just turns into a constructor parameter internally. (There is a slightly funny interaction between constructor parameters and let-bound expressions after class but before object: if there is no constructor parameter the let is evaluated at class construction, but if there is a parameter it is evaluated at object construction, whether or not it depends on the parameter.)

Virtual methods and instance variables

A class definition with a virtual method:

class virtual abs_point = 
object 
  method virtual move : int -> unit 
end

compiles to:

let abs_point = [| 
  0; 
  (fun clas -> 
    let move = OO.get_method_label clas "move" in 
    (fun env self -> 
      OO.create_object_opt self clas); 
  0; 0 
|]

Since a virtual class can’t be instantiated, there’s no need to create the class table with make_class; we just return the tuple that represents the class, containing the class and object constructor. (I don’t understand the call to get_method_label, since its value is unused; possibly it is called for its side effect, which is to register the method in the class table if it does not already exist.)

A subclass implementing the virtual method inherits from the virtual class in the usual way.

A class declaration with a virtual instance variable:

class virtual abs_point2 = 
object 
  val mutable virtual x : int 
  method move d = x <- x + d 
end

becomes:

let abs_point = [| 
  0; 
  (fun clas 
    let ids = 
      OO.new_methods_variables [|"move"|] [|"x"|] in 
    ... (* as before *)); 
  0; 0 
|]

Again, a subclass providing the instance variable inherits from the virtual class in the usual way. By the time new_methods_variables is called in the superclass, the subclass has registered a slot for the variable.

Private methods

A class definition with a private method:

class point = 
object 
  val mutable x = 0 
  method get_x = x 
  method private move d = x <- x + d 
end

compiles to:

let point = 
  let point_init clas = 
    let ids = 
      OO.new_methods_variables clas [|"move";"get_x"|] [|"x"|] in 
    ... (* as before *) 
  OO.make_class [|"get_x"|] point_init

Everything is the same except that the private method is not listed in the public methods of the class. Since a private method is callable only from code in which the class of the object is statically known, there is no need for dispatch or a method label. The private method functions are stored in methods after the public methods and method labels.

If we expose a private method in a subclass:

class point = ... (* as before *) 
class point_sub = 
object 
  inherit point 
  method virtual move : _ 
end

we get:

let point = ... (* as before *) 
let point_sub = 
  let point_sub_init clas = ... (* as before *) 
  OO.make_class [|"move";"get_x"|] point_sub_init

Putting "move" in the call to make_class registers it as a public method, so later, when set_method is called for move in the superclass constructor, it puts the method and its label in methods for dispatch.

Reading Camlp4, part 5: filters

2010-03-02T18:09:00.000-08:00

Hey, long time no see!

It is high time to get back to Camlp4, so I would like to pick up the thread by covering Camlp4 filters. We have previously considered the parsing and pretty-printing facilities of Camlp4 separately. But of course the most common way to use Camlp4 is as a front-end to ocamlc, where it processes files by parsing them into an AST and pretty-printing them back to text (well, not quite—we will see below how the AST is passed to ocamlc). In between we can insert filters to transform the AST.

A simple filter

So let’s dive into an example: a filter for type definitions that generates t_to_string and t_of_string functions for a type t, a little like Haskell’s deriving Show, Read. To keep it simple we handle only variant types, and only those where all the arms have no data. Here goes:

module Make (AstFilters : Camlp4.Sig.AstFilters) = 
struct 
  open AstFilters

In order to hook into Camlp4’s plugin mechanism we define the filter as a functor. By opening AstFilters we get an Ast module in scope. Unfortunately this is not the same Ast we got previously from Camlp4.PreCast (although it has the same signature) so all our code that uses Ast (including all OCaml syntax quotations) needs to go inside the functor body.

  let rec filter si = 
    match wrap_str_item si with 
      | <:str_item< type $lid:tid$ = $Ast.TySum (_, ors)$ >> -> 
          begin 
            try 
              let cons = 
                List.map 
                  (function 
                     | <:ctyp< $uid: c$ >> -> c 
                     | _ -> raise Exit) 
                  (Ast.list_of_ctyp ors []) in 
              to_of_string si tid cons 
            with Exit -> si 
          end 
       | _ -> si

The filter function filters Ast.str_items. (It is not actually recursive but we say let rec so we can define helper functions afterward). If a str_item has the right form we transform it by calling to_of_string, otherwise we return it unchanged. We match a sum type definition, then extract the constructor names (provided that they have no data) into a string list. (Recall that a TySum contains arms separated by TyOr; the call to list_of_ctyp converts that to a list of arms.)

  and wrap_str_item si = 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< $si$ >>

For some reason, <:str_item< $si$ >> wraps an extra StSem / StNil around si, so in order to use the quotation syntax on the left-hand side of a pattern match we need to do the same wrapping.

  and to_of_string si tid cons = 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< 
      $si$;; 
      $to_string _loc tid cons$;; 
      $of_string _loc tid cons$;; 
    >>

This str_item replaces the original one in the output, so we include the original one in additional to new ones containing the t_to_string and t_of_string functions.

  and to_string _loc tid cons = 
    <:str_item< 
      let $lid: tid ^ "_to_string"$ = function 
        $list: 
          List.map 
            (fun c -> <:match_case< $uid: c$ -> $`str: c$ >>) 
            cons$ 
    >>

To convert a variant to a string, we match over its constructors and return the corresponding string.

  and of_string _loc tid cons = 
    <:str_item< 
      let $lid: tid ^ "_of_string"$ = function 
        $list: 
          List.map 
            (fun c -> <:match_case< 
       $tup: <:patt< $`str: c$ >>$ -> $uid: c$ 
     >>) 
            cons$ 
        | _ -> invalid_arg "bad string" 
    >>

To convert a string to a variant, we match over the corresponding string for each constructor and return the constructor; we also need a catchall for strings that match no constructor. (What is this tup and patt business? A contrived bug which we will fix below.)

  ;; 
  AstFilters.register_str_item_filter begin fun si -> 
    let _loc = Ast.loc_of_str_item si in 
    <:str_item< 
      $list: List.map filter (Ast.list_of_str_item si [])$ 
    >> 
  end

Now we register our filter function with Camlp4. The input str_item may contain many str_itemss separated by StSem, so we call list_of_str_item to get a list of individuals.

end 
module Id = 
struct 
  let name = "to_of_string" 
  let version = "0.1" 
end 
;; 
let module M = Camlp4.Register.AstFilter(Id)(Make) in ()

Finally we register the plugin with Camlp4. The functor application is just for its side effect, so the plugin is registered when its .cmo is loaded. We can compile the plugin with

ocamlfind ocamlc -package camlp4.quotations.o -syntax camlp4o \ 
  -c to_of_string.ml

and run it on a file (containing type t = Foo | Bar | Baz or something) with

camlp4o to_of_string.cmo test.ml

Ocamlc's AST

Looks pretty good, right? But something goes wrong when we try to use our plugin as a frontend for ocamlc:

ocamlc -pp 'camlp4o ./to_of_string.cmo' test.ml

We get a preprocessor error, “singleton tuple pattern”. It turns out that Camlp4 passes the processed AST to ocamlc not by pretty-printing it to text, but by converting it to the AST type that ocamlc uses and marshalling it. This saves the time of reparsing it, and also passes along correct file locations (compare to cpp’s #line directives). However, as we have seen, the Camlp4 AST is pretty loose. When converting to an ocamlc AST, Camlp4 does some validity checks on the tree. What can be confusing is that an AST that fails these checks may look fine when pretty-printed.

Here the culprit is the line

       $tup: <:patt< $`str: c$ >>$ -> $uid: c$

which produces an invalid pattern consisting of a one-item tuple. When pretty-printed, though, the tup just turns into an extra set of parentheses, which ocamlc doesn’t mind. What we wanted was

       $`str: c$ -> $uid: c$

This is a contrived example, but this kind of error is easy to make, and can be hard to debug, because looking at the pretty-printed output doesn’t tell you what’s wrong. One tactic is to run your code in the toplevel, which will print the constructors of the AST as usual. Another is to use a filter that comes with Camlp4 to “lift” the AST—that is, to generate the AST representing the original AST! Maybe it is easier to try it than to explain it:

camlp4o to_of_string.cmo -filter Camlp4AstLifter test.ml

Now compare the result to the tree you get back from Camlp4’s parser for the code you meant to write, and you can probably spot your mistake.

(If you tried to redirect the camlp4o command to a file or pipe it through less you got some line noise—this is the marshalled ocamlc AST. By default Camlp4 checks whether its output is a TTY; if so it calls the pretty-printer, if not the ocamlc AST marshaller. To override this use the -printer o option, or -printer r for revised syntax.)

Other builtin filters

This Camlp4AstLifter is pretty useful. What else comes with Camlp4? There are several other filters in camlp4/Camlp4Filters which you can call with -filter:

Camlp4FoldGenerator generates visitor classes from datatypes. Try putting class x = Camlp4MapGenerator.generated after a type definition. The idea is that you can override methods of the visitor so you can do some transformation on a tree without having to write the boilerplate to walk the parts you don’t care about. In fact, this filter is used as part of the Camlp4 bootstrap to generate vistors for the AST; you can see the map and fold classes in camlp4/Camlp4/Sig.ml.
Camlp4MetaGenerator generates lifting functions from a type definition—these functions are what Camlp4AstLifter uses to lift the AST, and it’s also how quotations are implemented. I’m planning to cover how to implement quotations / antiquotations (for a different language) in a future post, and Camlp4MetaGenerator will be crucial.
Camlp4LocationStripper replaces all the locations in an AST with Loc.ghost. I don’t know what this is for, but it might be useful if you wanted to compare two ASTs and be insensitive to their locations.
Camlp4Profiler inserts profiling code, in the form of function call counts. I haven’t tried it, and I’m not sure when you would want it in preference to gprof.
Camlp4TrashRemover just filters out a module called Camlp4Trash. Such a module may be found in camlp4/Camlp4/Struct/Camlp4Ast.mlast; I think the idea is that the module is there in order to generate some stuff, but the module itself is not needed.
Camlp4MapGenerator has been subsumed by Camlp4FoldGenerator.
Camlp4ExceptionTracer seems to be a special-purpose tool to help debug Camlp4.

OK, maybe not too much useful stuff here, but it is interesting to work out how Camlp4 is bootstrapped.

I think next time I will get into Camlp4’s extensible parsers, on the way toward syntax extensions.

Colophon

I wrote my previous posts in raw HTML, with highlighted code generated from a hightlighted Emacs buffer by htmlize.el. Iterating on this setup was unutterably painful. This post was written using jekyll with a simple template to approximate the Blogspot formatting, mostly so I can check that lines of code aren’t too long. Jekyll is very nice: you can write text with Markdown, and highlight code with Pygments.

Lwt and Concurrent ML

2009-05-27T20:28:00.000-07:00

Programming concurrent systems with threads and locks is famously, even fabulously, error-prone. With Lwt's cooperative threads you don't have to worry so much about protecting data structures against concurrent modification, since your code runs atomically between binds. Still, the standard concurrency primitives (mutexes, condition variables) are sometimes useful; but using them with Lwt is not much less painful than with preemptive threads. In this post I want to explore the combination of Lwt with the concurrency primitives of Concurrent ML. I hope to convince you that CML's primitives are easier to use, and a good match for Lwt.

Blocking queues in Lwt

I got started with Lwt when I was writing a work queue (as an Ocamlnet RPC service using orpc). The server keeps a queue of jobs, and workers poll for a task via RPC. An RPC request turns into an Lwt thread; all these threads share the queue. If there's no job in the queue, a request blocks until one is available. So I needed a blocking queue, with the following signature:

type 'a t
val create : unit -> 'a t
val add : 'a -> 'a t -> unit
val take : 'a t -> 'a Lwt.t

The queue is unbounded, so you can add without blocking, but a take may block. (It's nice how in Lwt the possibility of blocking is revealed in the type). Here's the implementation:

type 'a t = {
  m : Lwt_mutex.t;
  c : Lwt_condition.t;
  q : 'a Queue.t;
}

let create () = {
  m = Lwt_mutex.create ();
  c = Lwt_condition.create ();
  q = Queue.create ();
}

A queue is made up of a regular OCaml queue, a condition variable (signaled when there's something in the queue), and a mutex for use with the condition variable. (The Lwt_condition module is based on the Condition module of the standard OCaml threads library.)

let add e t =
  Queue.add e t.q;
  Lwt_condition.signal t.c

let take t =
  Lwt_mutex.lock t.m >>= fun () ->
  if Queue.is_empty t.q
  then Lwt_condition.wait t.c t.m
  else Lwt.return () >>= fun () ->
  let e = Lwt.return (Queue.take t.q) in
  Lwt_mutex.unlock t.m;
  e

Since Lwt threads are cooperative we don't need to worry about concurrent access to the underlying queue. The role of the mutex here is only to ensure that when a thread blocked on the condition gets signaled, another thread can't take the element first.

Timeouts?

What if there are no entries in the queue for a while? Within a single process, no big deal, the thread can keep waiting forever. That doesn't seem like a good idea over a network connection; we should time out at some point and return a response indicating that no task is available. Here is a first attempt at taking an element from the queue with a timeout:

Lwt.choose [
  Lwt_queue.take q;
  Lwt_unix.sleep timeout >>= fun () ->
    Lwt.fail (Failure "timeout");
]

The Lwt.choose function "behaves as the first thread [...] to terminate". However, the other threads are still running after the first one terminates. It doesn't matter if the sleep is still running after the take completes, but if the sleep finishes first, then the take thread is still waiting to take an element from the queue. When an element becomes available, this thread takes it, and drops it on the floor (since the choose has already finished). And in general this sort of thing can happen whenever a thread you choose between has some effect; the effect still happens even if the thread is not chosen. A thread can block on only one condition at a time. In order to take an element with a timeout, we're forced to build timeouts into the queue, so we can get at the queue's condition variable. We add an optional argument to take:

val take : ?timeout:float -> 'a t -> 'a Lwt.t

and modify the implementation:

let take ?(timeout=(-1.)) t =
  let timed_out = ref false in
  if timeout >= 0.
  then
    Lwt.ignore_result
      (Lwt_unix.sleep timeout >>= fun () ->
        timed_out := true;
        Lwt_condition.broadcast t.c;
        Lwt.return ());
  Lwt_mutex.lock t.m >>= fun () ->
    let rec while_empty () =
      if !timed_out then Lwt.return false
      else if not (Queue.is_empty t.q) then Lwt.return true
      else Lwt_condition.wait t.c t.m >>= while_empty in
    while_empty () >>= fun not_empty ->
    let e = if not_empty then Some (Queue.take t.q) else None in
    Lwt_mutex.unlock t.m;
    Lwt_condition.signal t.c;
    match e with Some e -> Lwt.return e | _ -> Lwt.fail Timeout

In an auxilliary thread we wait for the timeout, then set a timeout flag for the main thread and broadcast the condition. It's important to use broadcast, which signals all waiting threads, instead of signal, which signals an arbitrary waiter, in order to be sure that we wake up the timed-out thread. But now it's possible for a thread to be signaled when neither the timeout has expired nor an element is available, so we must loop around waiting on the condition. And a signal from adding an element may be sent to a timed-out thread, so we need to signal another thread to avoid forgetting the added element. This is not very nice. First, the interface isn't modular. We've hard-coded a particular pair of events to wait for; what if we wanted to wait on two queues at once, or a queue and network socket? Second, the implementation is tricky to understand. We have to reason about how multiple threads, each potentially at a different point in the program, interact with the shared state.

Lwt_event

Concurrent ML provides a different set of primitives. It makes the notion of an event--something that may happen in the future, like a timeout or a condition becoming true--into an explicit datatype, so you can return it from a function, store it in a data structure, and so on:

type 'a event

When an event occurs, it carries a value of type 'a. The act of synchronizing on (waiting for) an event is a separate function:

val sync : 'a event -> 'a Lwt.t

Of course it returns Lwt.t since it may block; the returned value is the value of the event occurrence. You can make an event that occurs when any of several events occurs, so a thread can wait on several events at once:

val choose : 'a event list -> 'a event

When one event occurs, the thread is no longer waiting on the other events (in contrast to Lwt.choose). Since synchronizing on a choice of events is a very common pattern, there's also

val select : 'a event list -> 'a Lwt.t

which is the same as sync of choose. Its meaning is very similar to Unix.select: block until one of the events occurs. A channel is sort of like a zero-length queue: both reader and writer must synchronize on the channel at the same time to pass a value from one to the other:

type 'a channel
val new_channel : unit -> 'a channel
val send : 'a channel -> 'a -> unit event
val receive : 'a channel -> 'a event

Both send and receive are blocking operations, so they return events. Finally, there's a way to map the value of an event when it occurs:

val wrap : 'a event -> ('a -> 'b Lwt.t) -> 'b event

The event wrap e f occurs when e occurs, with value f v (where v is the value returned by the occurrence of e). (Here's the full interface of Lwt_event. There are events for Unix file descriptor operations in Lwt_event_unix.)

Blocking queues with Lwt_event

Now I want to reimplement blocking queues using these new primitives:

type 'a t

val create : unit -> 'a t
val add : 'a -> 'a t -> unit Lwt.t
val take : 'a t -> 'a Lwt_event.event

The interface is similar. As before, take is a blocking operation, but it returns an event instead of Lwt.t so we can combine it with other events using choose. The new add returns Lwt.t, but this is an artifact: a thread calling add won't actually block (we'll see why below). For this reason, add doesn't need to return event.

type 'a t = {
  inch: 'a channel;
  ouch: 'a channel;
}
let add e t = sync (send t.inch e)
let take t = receive t.ouch

A queue consists of two channels, one for adding items into the queue and one for taking them out. The functions implementing the external interface just send and receive on these channels.

let create () =
  let q = Queue.create () in
  let inch = new_channel () in
  let ouch = new_channel () in

To create a queue, we make the channels and the underlying queue (we don't need to store it in the record; it will be hidden in a closure). We're going to have an internal thread to manage the queue; next we need some events for it to interact with the channels:

  let add =
    wrap (receive inch) (fun e ->
      Queue.add e q;
      Lwt.return ()) in

  let take () =
    wrap (send ouch (Queue.peek q)) (fun () ->
      ignore (Queue.take q);
      Lwt.return ()) in

Here add receives an element from the input channel and adds it to the underlying queue; and take sends the top element of the queue on the output channel. Keep in mind that these events don't occur (and the function passed to wrap is not executed) until there's actually a thread synchronizing on the complementary event on the channel. We call Queue.peek in take because at the point that we offer to send an element on a channel, we have to come up with the element; but we don't want to take it off the underlying queue, because there might never be a thread synchronizing on the complementary event on the channel. (Maybe there should be a version of send that takes a thunk?)

  let rec loop () =
    let evs =
      if Queue.is_empty q
      then [ add ]
      else [ add; take () ] in
    select evs >>= loop in
  ignore (loop ());

  { inch = inch; ouch = ouch }

Here's the internal thread. If the queue is empty all we can do is wait for an element to be added; if not, we wait for an element to be added or taken. Now we can see why the add function of the external queue interface can't block: we always select the add event, so as soon as another thread wants to send an element on the input channel, the internal thread is available to receive it.

Timeouts!

Now, the punchline: we didn't build timeouts into the queue; still we can select between taking an element or timing out:

select [
  Lwt_event_queue.take q;
  wrap (Lwt_event_unix.sleep timeout)
    (fun () -> Lwt.fail (Failure "timeout"));
]

Much better. Moreover, I think this queue implementation is easier to reason about (once you're comfortable with the CML primitives), even compared to our first version (without timeouts). The difference is that only the internal thread touches the state of the queue--in fact it's the only thread for which the state is even in scope! We don't need to worry conditions and signaling; we just offer an element on the output channel when one is available. This is only an inkling of the power of CML; the book Concurrent Programming in ML contains much more, including some large examples.

Why is this style of concurrency not more common? I think there are several reasons: First, idiomatic CML programming requires very lightweight threads (you don't want a native thread, or even an OCaml bytecode thread, for every queue). Second, the wrap combinator, essential for building complex events, requires higher-order functions, so there's no similarly concise translation into, say, Java. Finally, I think it's not widely appreciated that concurrent programming is useful without parallel programming. The mutex approach works fine for parallel programming, while CML has only recently been implemented in a parallel setting. None of these reasons applies to Lwt programming; Concurrent ML is a good fit with Lwt.

In an earlier post I asserted (without much to back it up) that Ocamlnet's Equeue gives better low-level control over blocking than Lwt. The Lwt_event and Lwt_event_unix modules provide a similar degree of control, with a higher-level interface.

Sudoku in ocamljs, part 3: functional reactive programming

2009-05-11T21:51:00.000-07:00

In part 1 and part 2 of this series, we made a simple Sudoku game and connected it to a game server. In this final installment I want to revisit how we check that a board satisfies the Sudoku rules. There's a small change to the UI: instead of a "Check" button, the board is checked continuously as the player enters numbers; any conflicts are highlighted as before. Here's the final result.

Let's review how we want checking to work: a cell is colored red if any other cell in the same row, column, or square (outlined in bold) contains the same number; otherwise the cell is colored white. Now take another look at the check_board function from part 1. Is it obvious that this code meets the specification? The function is essentially stateful, clearing all the cell colors then setting them red when it discovers a conflict. In fact, I had a bug in it related to state--I was clearing the background color in the None arm of check_set, so each checked constraint would overwrite the highlighting of the previous ones where they overlapped.

It would be easier to convince ourselves that we'd gotten it right if the code looked more like the specification. What we want is a function that maps each cell and its "adjacent" cells (the ones in the same row, column, or square) to a boolean (true if the cell is highlighted). Abstracting from the DOM details, suppose a cell is an int option and we have a function adjacents i j that returns a list of cells adjacent to the cell at (i, j). Then the check function is just:

let highlighted cell i j =
  cell <> None && List.mem cell (adjacents i j)

So how do we hook this function into the UI? We could just call it for every cell, every time we get a change event for some cell. That seems like a lot of needless computation, since almost all the cells haven't changed. On the other hand, if we manually keep track of which cells might be affected by a change, our code is no longer obviously correct. It would be nice to have some kind of incremental update, like a spreadsheet.

This is where functional reactive programming comes in. The main idea is to write functions over behaviors, or values that can change. If you change an input to a function, the output (another behavior) is automatically recomputed. The dependency bookkeeping is taken care of by the framework; we'll use the froc library.

It turns out to be convenient to give behaviors a monadic interface. So we have a type 'a behavior; we turn a constant into a behavior with return, and we use a behavior with bind. We saw in part 2 that the monadic interface of Lwt enables blocking: since bind takes a function to apply to the result of a thread, the framework can wait until the thread has completed before applying it. With froc, the framework applies the function passed to bind whenever the bound behavior changes. With both Lwt and froc you can think of a computation as a collection of dependencies rather than a linear sequence.

There's another important piece of functional reactive programming: events. An 'a event in froc is a channel over which values of type 'a can be passed. You can connect froc events to DOM events to interact with the stateful world of the UI. The library includes several functions for working with events (e.g. mapping a function over an event stream) and in particular for mediating between behaviors and events, such as:

val hold : 'a -> 'a event -> 'a behavior

which takes an initial value and an event channel, and returns a behavior that begins at the initial value then changes to each successive value that's sent on the channel, and

val changes : 'a behavior -> 'a event

which takes a behavior and returns an event channel that has a value sent on it whenever the behavior changes.

This all probably seems a bit abstract, so let's dive into the example code:

module D = Dom
let d = D.document

module F = Froc
module Fd = Froc_dom
let (>>=) = F.(>>=)

We set up some constants we'll need below. The Froc module contains the core FRP implementation, not tied to a particular UI toolkit; Froc_dom contains functions that are specific to DOM programming (with the Dom module we saw before).

let make_cell v =
  let ev = F.make_event () in
  let cell = F.hold v ev in
  let set v = F.send ev v in
  (cell, set)
let notify_e e f =
  F.notify_e e (function
    | F.Fail _ -> ()
    | F.Value v -> f v)

These are a couple of functions that really should be part of froc (and will be in the next version). The first makes a cell, which is a behavior (the hold of an event channel) along with a function to set its value (which sends the value on the channel). It's like a ref cell, but we can bind it so changes are propagated. We'll have one of these for each square on the Sudoku board, but it is a generally useful construct.

The second papers over a design error in the froc API: like with Lwt threads, a froc behavior or event value can be either a normal value or an exception (together, a result). The notify_e function sets a callback that's called when an event arrives on the channel, but most of the time we just want to ignore exceptional events.

let attach_input_value i b =
  notify_e (F.changes b) (fun v -> i#_set_value v)
let attach_backgroundColor e b =
  notify_e
    (F.changes b)
    (fun v -> e#_get_style#_set_backgroundColor v)

These are functions that should be part of Froc_dom. To attach a DOM element to a behavior means to update the DOM element whenever the behavior changes. But there are lots of ways to update a DOM element, and Froc_dom doesn't include them all. (This design contrasts with that of Flapjax, where you work with behaviors whose value is an entire DOM element. It's certainly possible to do this in froc, but more tedious because of the types.)

let (check_enabled, set_check_enabled) = make_cell false

Now we're in the application code. The check_enabled cell controls whether checking is turned on--we'll see below what this is for, as you may have noticed that there is no such switch in the actual UI.

let make_board () =
  let make_input () =
    let input = (d#createElement "input" : D.input) in
    input#setAttribute "type" "text";
    input#_set_size 1;
    input#_set_maxLength 1;
    let style = input#_get_style in
    style#_set_border "none";
    style#_set_padding "0px";

    let (cell, set) = make_cell None in
    attach_input_value input
      (cell >>= function
        | None -> F.return ""
        | Some v -> F.return (string_of_int v));
    let ev =
      F.map
        (function
          | "1" | "2" | "3" | "4" | "5"
          | "6" | "7" | "8" | "9"  as v ->
            Some  (int_of_string v)
          | _ -> None)
        (Fd.input_value_e input) in
    notify_e ev set;
    (cell, set, input) in

Here we make the game board much as we did in part 1. The main difference is that instead of working directly with DOM input nodes, we connect each input to a cell of type int option. The attach_input call sets the value of the DOM input node whenever the cell changes, and the notify_e call sets the cell whenever the input node changes. (This doesn't loop, because Fd.input_value_e makes an event stream from the "onchange" events of the input, and "onchange" events are only sent when the user changes the input, not when it's changed from Javascript.) We take the stream of strings and map it into a stream of int options, validating the string as we go.

  let rows =
    Array.init 9 (fun i ->
      Array.init 9 (fun j ->
        make_input ())) in

  let adjacents i j =
    let adj i' j' =
      (i' <> i || j' <> j) &&
        (i' = i or j' = j or
            (i' / 3 = i / 3 && j' / 3 = j / 3)) in
    let rec adjs i' j' l =
      match i', j' with
        | 9, _ -> l
        | _, 9 -> adjs (i'+1) 0 l
        | _, _ ->
            let l =
              if adj i' j'
              then
                let (cell,_,_) = rows.(i').(j') in
                cell::l
              else l in
            adjs i' (j'+1) l in
    adjs 0 0 [] in

We make the game board as a matrix of inputs as before, but now each element of the matrix contains a cell (an int option behavior), the function to set that cell, and the actual DOM input element. Next we set up the rule-checking. The adjacents function returns a list of cells adjacent to the cell at (i, j) (adjacent in the sense we discussed above). All my bugs when I wrote this example were in this function, but it clearly embodies the specification we're trying to meet: a cell is adjacent to the current cell if it is not the same cell and is in the same row, column, or square. (The loop would be clearer if we had Array.foldi.)

  ArrayLabels.iteri rows ~f:(fun i row ->
    ArrayLabels.iteri row ~f:(fun j (cell, _, input) ->
      let adjs = adjacents i j in
      attach_backgroundColor input
        (check_enabled >>= function
          | false -> F.return "#ffffff"
          | true ->
              F.bindN adjs (fun adjs ->
                cell >>= fun v ->
                  if v <> None && List.mem v adjs
                  then F.return "#ff0000"
                  else F.return "#ffffff"))));

This is the functional reactive core of the program. For each square on the board we compute essentially the highlighted function above, but in monadic form (the bindN function binds a list of behaviors at once), and attach the result to the background color of the input node. Because the set of adjacent cells does not depend on the value of the cells, we can hoist its computation out of the reactive part so it won't be recomputed every time a cell changes (and since dependency on a behavior is captured in the type of a function, the fact that this typechecks tells us it is safe to do!).

That's it. The rest of the program is almost the same as before. (Here's the full code.) The one important change has to do with check_enabled. In the reaction to cell changes, we consult check_enabled, returning the unhighlighted color when it's false. Since we do this before binding the cells, a change to a cell causes no recomputation when check_enabled is false. So we turn off check_enabled while loading a new game board, saving a lot of needless recomputation that otherwise makes it annoyingly slow.

It's interesting to compare functional reactive programming to the model-view-controller pattern. The point of MVC is to separate the changeable state (the model) from how it is displayed (the view). Although MVC is typically implemented with change events and state update, a view behaves as a pure function of the state (or can be made so by making the state of UI components explicit). So you could think of FRP as "automatic" MVC: you just write down dependencies (with bind) and the framework manages events and state update. For small examples this may not seem like a big win, but FRP takes care of some complexities that tend to swamp MVC apps: managing dynamic dependencies (registering and unregistering event handlers in response to events) and maintaining coherence (i.e. functional behavior) over different event orders.

I haven't yet written a serious application with froc, but so far I think it is awesome!

Sudoku in ocamljs, part 2: RPC over HTTP

2009-05-03T22:35:00.000-07:00

Last time we made a simple user interface for Sudoku with the Dom module of ocamljs. It isn't a very fun game though since there are no pre-filled numbers to constrain the board. So let's add a button to get a new game board; here's the final result.

I don't know much about generating Sudoku boards, but it seems like it might be slow to do it in the browser, so we'll do it on the server, and communicate to the server with OCaml function calls using the RPC over HTTP support in orpc.

The 5-minute monad

But first I'm going to give you a brief introduction to monads (?!). Bear with me until I can explain why we need monads for Sudoku, or skip it if this is old hat to you. We'll transform the following fragment into monadic form:

let foo () = 7 in
bar (foo ())

First put it in named form by let-binding the result of the nested function application:

let foo () = 7 in
let f = foo () in
bar f

Then introduce two new functions, return and bind:

let return x = x
let bind x f = f x

let foo () = return 7 in
bind (foo ()) (fun f ->
  bar f)

These functions are a bit mysterious (although the name "bind" is suggestive of let-binding), but we haven't changed the meaning of the fragment. Next we would like to enforce that the only way to use the result of foo () is by calling bind. We can do that with an abstract type:

type 'a t
val return : 'a -> 'a t
val bind  : 'a t -> ('a -> 'b t) -> 'b t

Taking type 'a t = 'a, the definitions of return and bind match this signature. So what have we accomplished? We've abstracted out the notion of using the result of a computation. It turns out that there are many useful structures matching this signature (and satisfying some equations), called monads. It's convenient that they all match the same signature, in part because we can mechanically convert ordinary code into monadic code, as we've done here, or even use a syntax extension to do it for us.

Lightweight threads in Javascript

One such useful structure is the Lwt library for cooperative threads. You can write Lwt-threaded code by taking ordinary threaded code and converting it to monadic style. In Lwt, 'a t is the type of threads returning 'a. Then bind t f calls f on the value of the thread t once t has finished, and return x is an already-finished thread with value x.

Lwt threads are cooperative: they run until they complete or block waiting on the result of another thread, but aren't ever preempted. It can be easier to reason about this kind of threading, because until you call bind, there's no possibility of another thread disturbing any state you're working on.

Lwt threads are a great match for Javascript, which doesn't have preemptive threads (although plugins like Google Gears provide them), because they need no special support from the language except closures. Typically in Javascript you write a blocking computation as a series of callbacks. You're doing essentially the same thing with Lwt, but it's packaged up in a clean interface.

Orpc for RPC over HTTP

The reason we care about threads in Javascript is that we want to make a blocking RPC call to the server to retrieve a Sudoku game board, without hanging the browser. We'll use orpc to generate stubs for the client and server. In the client the call returns an Lwt thread, so you need to call bind to get the result. In the server it arrives as an ordinary procedure call.

To use orpc you write down the signature of the RPC interface, in Lwt and Sync forms for the client and server. Orpc checks that the two forms are compatible, and generates the stubs. Here's our interface (proto.ml):

module type Sync =
sig
  val get_board : unit -> int option array array
end

module type Lwt =
sig
  val get_board : unit -> int option array array Lwt.t
end

The get_board function returns a 9x9 array, each cell of which may contain None or Some k where k is 1 to 9. We can't capture all these constraints in the type, but we get more static checking than if we were passing JSON or XML.

Generating the board

On the server, we implement a module that matches the Sync signature. (You can see that I didn't actually implement any Sudoku-generating code, but took some fixed examples from Gnome Sudoku.) Then there's some boilerplate to set up a Netplex HTTP server and register the module at the /sudoku path. It's pretty simple. The Proto_js_srv module contains stubs generated by orpc from proto.ml, and Orpc_js_server is part of the orpc library.

Using the board

The client is mostly unchanged from last time. There's a new button, "New game", that makes the RPC call, then fills in the board from the result.

let (>>=) = Lwt.(>>=)

The >>= operator is another name for bind. If you aren't using pa_monad (which we aren't here), it makes a sequence of binds easier to read.

module Server =
  Proto_js_clnt.Lwt(struct
    let with_client f = f (Orpc_js_client.create "/sudoku")
  end)

This sets up the RPC interface, so calls on the Server module become RPC calls to the server. The Proto_js_client module contains stubs generated from proto.ml, and Orpc_js_client is part of the orpc library. (In the actual source you'll see that I faked this out in order to host the running example on Google Code--there's no way to run an OCaml server, so I randomly choose a canned response.)

let get_board rows _ =
  ignore
    (Server.get_board () >>= fun board ->
      for i = 0 to 8 do
        for j = 0 to 8 do
          let cell = rows.(i).(j) in
          let style = cell#_get_style in
          style#_set_backgroundColor "#ffffff";
          match board.(i).(j) with
            | None ->
                cell#_set_value "";
                cell#_set_disabled false
            | Some n ->
                cell#_set_value (string_of_int n);
                cell#_set_disabled true
        done
      done;
      Lwt.return ());
  false

This is the event handler for the "New game" button. We call get_board, bind the result, then fill in the board. If there's a number in a cell we disable the input box so the player can't change it. Here's the full code.

Doing AJAX programming with orpc and Lwt really shows off the power of compiling OCaml to Javascript. While Google Web Toolkit has a similar RPC mechanism (that generates stubs from Java interfaces), it's much clumsier to use, because you're still working at the level of callbacks rather than threads. Maybe you could translate Lwt to Java, but it would be painfully verbose without type inference.

This monad stuff will come in handy again next time, when we'll revisit the problem of checking the Sudoku constraints on the board, using froc.

Sudoku in ocamljs, part 1: DOM programming

2009-04-26T22:30:00.000-07:00

Let's make a Sudoku game with ocamljs and the Dom library for programming the browser DOM. Like on the cooking shows, I have prepared the dish we're about to make beforehand; why don't you taste it now? OK, it is not yet Sudoku, lacking the important ingredient of some starting numbers to guide the game--we'll come back to that next time.

module D = Dom
let d = D.document

We begin with some definitions. The Dom module includes class types for much of the standard browser DOM, using the ocamljs facility for interfacing with Javascript objects. Dom.document is the browser document object.

let make_board () =
  let make_input () =
    let input = (d#createElement "input" : D.input) in
    input#setAttribute "type" "text";
    input#_set_size 1;
    input#_set_maxLength 1;
    let style = input#_get_style in
    style#_set_border "none";
    style#_set_padding "0px";
    let enforce_digit () =
      match input#_get_value with
        | "1" | "2" | "3" | "4" | "5"
        | "6" | "7" | "8" | "9" -> ()
        | _ -> input#_set_value "" in
    input#_set_onchange (Ocamljs.jsfun enforce_digit);
    input in

We construct the Sudoku board in several steps. First, we make an input box for each square. Notice that you can call DOM methods (e.g. createElement) with OCaml object syntax. But what is the type of createElement? The type of the object you get back depends on the tag name you pass in; OCaml has no type for that. So createElement is declared to return #element (that is, a subclass of element). If you need only methods from element then you usually don't need to ascribe a more-specific type, but in this case we need an input node. (Static type checking with Javascript objects is therefore only advisory in some cases--if you ascribe the wrong type you can get a runtime error--but still better than nothing.)

We next set some attributes, properties, and styles on the input box. Properties are manipulated with specially-named methods: foo#_get_bar becomes foo.bar in Javascript, and foo#_set_bar baz becomes foo.bar = baz. Finally we add a validation function to enforce that the input box contains at most a single digit. To set the onchange handler, you need to wrap it in Ocamljs.jsfun, because the calling convention of an ocamljs function is different from that of plain Javascript function (to accomodate partial application and tail recursion).

  let make_td i j input =
    let td = d#createElement "td" in
    let style = td#_get_style in
    style#_set_borderStyle "solid";
    style#_set_borderColor "#000000";
    let widths = function
      | 0 -> 2, 0 | 2 -> 1, 1 | 3 -> 1, 0
      | 5 -> 1, 1 | 6 -> 1, 0 | 8 -> 1, 2
      | _ -> 1, 0 in
    let (top, bottom) = widths i in
    let (left, right) = widths j in
    let px k = string_of_int k ^ "px" in
    style#_set_borderTopWidth (px top);
    style#_set_borderBottomWidth (px bottom);
    style#_set_borderLeftWidth (px left);
    style#_set_borderRightWidth (px right);
    ignore (td#appendChild input);
    td in

Next we make a table cell for each square, containing the input box, with borders according to its position in the grid. Here we don't ascribe a type to the result of createElement since we don't need any td-specific methods.

  let rows =
    Array.init 9 (fun i ->
      Array.init 9 (fun j ->
        make_input ())) in

  let table = d#createElement "table" in
  table#setAttribute "cellpadding" "0px";
  table#setAttribute "cellspacing" "0px";
  let tbody = d#createElement "tbody" in
  ignore (table#appendChild tbody);
  ArrayLabels.iteri rows ~f:(fun i row ->
    let tr = d#createElement "tr" in
    ArrayLabels.iteri row ~f:(fun j cell ->
      let td = make_td i j cell in
      ignore (tr#appendChild td));
    ignore (tbody#appendChild tr));

  (rows, table)

Then we assemble the full board: make a 9 x 9 matrix of input boxes, make a table containing the input boxes, then return the matrix and table. Notice that we freely use the OCaml standard library. Here the tbody is necessary for IE; the cellpadding and cellspacing don't work in IE for some reason that I have not tracked down. This raises an important point: the Dom module is the thinnest possible wrapper over the actual DOM objects, and as such gives you no help with cross-browser compatibility.

let check_board rows _ =
  let error i j =
    let cell = rows.(i).(j) in
    cell#_get_style#_set_backgroundColor "#ff0000" in

  let check_set set =
    let seen = Array.make 9 None in
    ArrayLabels.iter set ~f:(fun (i,j) ->
      let cell = rows.(i).(j) in
      match cell#_get_value with
        | "" -> ()
        | v ->
            let n = int_of_string v in
            match seen.(n - 1) with
              | None ->
                  seen.(n - 1) <- Some (i,j)
              | Some (i',j') ->
                  error i j;
                  error i' j') in

  let check_row i =
    check_set (Array.init 9 (fun j -> (i,j))) in

  let check_column j =
    check_set (Array.init 9 (fun i -> (i,j))) in

  let check_square i j =
    let set = Array.init 9 (fun k ->
      i * 3 + k mod 3, j * 3 + k / 3) in
    check_set set in

  ArrayLabels.iter rows ~f:(fun row ->
    ArrayLabels.iter row ~f:(fun cell ->
      cell#_get_style#_set_backgroundColor "#ffffff"));

  for i = 0 to 8 do check_row i done;
  for j = 0 to 8 do check_column j done;
  for i = 0 to 2 do
    for j = 0 to 2 do
      check_square i j
    done
  done;
  false

Now we define a function to check that the Sudoku constraints are satisfied: that no row, column, or heavy-lined square has more than one occurrence of a digit. If more than one digit occurs then we color all occurrences red. The only ocamljs-specific parts here are getting the cell contents (with _get_value) and setting the background color style. However, it's worth noticing the algorithm: we imperatively clear the error states for all cells, then set error states as we check each constraint. I'll revisit this in a later post about functional reactive programming.

let onload () =
  let (rows, table) = make_board () in
  let check = d#getElementById "check" in
  check#_set_onclick (Ocamljs.jsfun (check_board rows));
  let board = d#getElementById "board" in
  ignore (board#appendChild table)

;;

D.window#_set_onload (Ocamljs.jsfun onload)

Finally we put the pieces together: make the board, insert it into the DOM, call check_board when the Check button is clicked, and call this setup code once the document has been loaded. See the full source for build files.

By writing this in OCaml rather than directly in Javascript, we've gained the assurance of static type checking; we get to use OCaml's syntax, pattern matching, and standard library; we have a for loop that's not broken. On the flip side we have to worry about type ascription and Ocamljs.jsfun. If you don't already think that OCaml is a better language than Javascript, this won't convince you. But perhaps the followup posts, in which I'll show how to use RPC over HTTP with orpc and functional reactive programming with froc, will tip the scales for you.

Monadic functional reactive AJAX in OCaml

2009-04-23T10:28:00.000-07:00

Yesterday I released three related projects which I've been working on for a long time:

ocamljs, a Javascript backend for ocamlc, along with some libraries for web programming
orpc, a tool for generating RPC stubs from OCaml signatures, either ONC RPC for use with Ocamlnet's RPC implementation, or RPC over HTTP for use with ocamljs
froc, a library for functional reactive programming that works with ocamljs

The idea of all this is to build a platform for client-side web programming like Google Web Toolkit (but better, of course :). There is still a lot of work to get there, but already we use ocamljs and orpc for production work at Skydeck. In my next few posts I'll work through some examples using ocamljs, orpc, and froc:

Equeue compared to Lwt

2009-02-09T23:43:00.000-08:00

I feel like taking a break from Camlp4, so in this post I'll take a look at two libraries for asynchronous networking programming in OCaml: Equeue and Lwt. Each provides cooperative multithreading and asynchronous access to networking calls; each has protocol implementations built on top of it (e.g. Nethttpd for Equeue and Ocsigen's HTTP implementation for of Lwt). So why would you want to use one over the other? Let's start with an overview of each.

Equeue

An Equeue event system comprises a queue of events and a set of event handlers. A running event system just pulls events off the queue and passes them to the event handlers. You can think of a group of related handlers as a thread (the thread is blocked until one of its handlers is called; when the handler returns the thread yields) but there is no particular data structure tying them together.

The Unixqueue module specializes Equeue to the case where the source of events is the Unix select call. It adds the idea of resources, which are operations that may cause an event. For example, the operation Wait_in on some file descriptor can cause the event Input_arrived for that descriptor. A resource also has an associated timeout (the Timeout event fires if the timeout is exceeded). Unixqueue also adds a way to group resources and handlers; a group can be removed from the event system with one call, so everything associated with a thread can be cleaned up at once.

On top of the low-level event queue mechanism, Equeue builds engines, which package up some event handlers and some internal state with a particular interface:

type 't engine_state =
  [ `Working of int
  | `Done of 't
  | `Error of exn
  | `Aborted
  ]

class type [ 't ] engine = object
  method state : 't engine_state
  method abort : unit -> unit
  method request_notification : (unit -> bool) -> unit
  method event_system : Unixqueue.event_system
end

An engine runs for a while, then finishes with some value, fails with an exception, or becomes aborted. Code that's interested in the result of an engine can use request_notification to find out when the state of the engine has changed.

Equeue provides a number of engines for networking tasks (such as connecting to a socket), and also for hooking engines together in various ways. Maybe the most interesting one (when comparing to Lwt at least) is:

class ['a, 'b] seq_engine :
  'a #engine ->
  ('a -> 'b #engine) ->
  ['b] engine

which feeds the result of one engine into a function that creates another engine. Does this look familiar?

Lwt

Lwt provides no equivalent to Equeue's low-level event handling. But an Lwt thread is quite similar to an Equeue engine, in that it runs for a while then finishes successfully with a value or fails with an exception (there is no aborted state). However, the type 'a Lwt.t of threads returning values of type 'a is abstract; to implement your own thread you must build it out of the functions provided by Lwt. Here are some important ones:

val return : 'a -> 'a t
val fail : exn -> 'a t

You create an already-terminated thread with a value or exception with return and fail respectively. (Equeue has epsilon_engine which does essentially the same thing.)

val wait : unit -> 'a t
val wakeup : 'a t -> 'a -> unit
val wakeup_exn : 'a t -> exn -> unit

These functions give you a way to make threads that return only after some event occurs. A thread created with wait is blocked until woken either with a value or an exception. Any threads using its value block until it's woken. But how does a thread use another thread's value?

val bind : 'a t -> ('a -> 'b t) -> 'b t

This function feeds the result of one thread into a function that creates another thread, just like Equeue's seq_engine above. The important thing is that the value may not be available yet. In that case the function you give as the second argument is added to a notification list and called when the value arrives. This is similar to Equeue's request_notification, except that with Lwt notification is entirely under the hood: asking to be notified and getting the value of the thread are the same operation.

(Maybe you noticed that the type Lwt.t together with the functions return and bind form a monad. It would appear that the same is true of Equeue's engine, epsilon_engine, and seq_engine, although I haven't checked that they satisfy the monad laws.)

The Lwt_unix module provides a set of Unix I/O functions that match many of the ordinary ones in the Unix module, but return Lwt.t values (i.e. threads). In order to use the value you have to bind the thread, and possibly block until the value arrives.

Comparison

Lwt is a very beautiful library. The monadic interface encourages you to think about interacting threads in terms of values and dependencies, rather than states and callbacks. Lwt code can be very concise, and with the help of pa_monad, it can look pretty much just like straight-line code. Equeue engines require more machinery to implement (in particular, request_notification, although the engine_mixin class helps with that), and this increased overhead makes it less convenient to use threads in a fine-grained way.

Lwt is particularly nice with exception handling. In most cases, if a thread raises an exception it will be converted to a failing thread, rather than escaping the thread machinery (as would happen in an Equeue engine if you don't explicitly catch the exception). Unfortunately there are places this doesn't work (in order to support constant-space tail calls), which can be surprising.

Equeue, on the other hand, gives you much better low-level control. Lwt gives you the monadic equivalent of a blocking threads interface: you get a read call that blocks until data is ready. Equeue separates notification of events from the actual I/O operations, so if you want to do something other than read when data is ready you can. You can also remove a resource, to indicate that you're no longer interested in its events. With Lwt once a thread is waiting to read, it keeps waiting until data is ready or the channel is aborted (using Lwt_unix.abort). This covers the common case where you want to close the connection on a timeout, but more complicated things are harder. In addition, since you always care about timeouts when doing network programming, it's convenient that Equeue builds them into the resource interface.

Equeue may be more efficient in low-level ways: for instance, if you're going to repeatedly read a socket you can leave the resource and handler in the event system; in Lwt every read adds a new action (the Lwt equivalent of a handler). But I bet this doesn't matter almost all the time.

So which one?

Lwt definitely wins on clarity, simplicity, and concision for higher-level coding. Equeue wins if you need low-level control, or possibly if you need the absolute most performance.

Another factor, however, is that Equeue works with the rest of Ocamlnet, and in particular the ONC RPC implementation and the awesome Netplex server framework. For this reason I've adapted Lwt to run on top of Equeue, in the lwt-equeue library that comes with orpc. (I hope to do another orpc release soon with the latest version of lwt-equeue; in the meantime you can try the trunk version.) With lwt-equeue it's straightforward to mix Lwt and Equeue code, so you can use each when it's most appropriate.

(By the way, Jérôme Vouillon's ML Workshop paper on Lwt is really nice; it explains some tricky details of the implementation.)

Next time back to Camlp4.

Reading Camlp4, part 4: consuming OCaml ASTs

2009-01-27T22:09:00.000-08:00

It's easy to think of Camlp4 as just "defmacro on steroids"; that is, just a tool for syntax extension, but it is really a box of independently-useful tools. As we've seen, Camlp4 can be used purely for code generation; in this post I'll describe a tool that uses it purely for code consumption: a (minimal, broken) version of otags:

open Camlp4.PreCast
module M = Camlp4OCamlRevisedParser.Make(Syntax)
module N = Camlp4OCamlParser.Make(Syntax)

We're going to call the OCaml parser directly. These functor applications are used only for their effect (which is to fill in an empty grammer with OCaml cases); ordinarily they would be called as part of Camlp4's dynamic loading process. Recall that the original syntax parser is an extension of the revised parser, so we need both, in this order.

let files = ref []

let rec do_fn fn =
  let st = Stream.of_channel (open_in fn) in
  let str_item = Syntax.parse_implem (Loc.mk fn) st in
  let str_items = Ast.list_of_str_item str_item [] in
  let tags = List.fold_right do_str_item str_items [] in
  files := (fn, tags)::!files

We'll call do_fn for each filename on the command line. The Syntax.parse_implem function takes a Loc.t and a stream, and parses the stream into a str_item. (The initial Loc.t just provides the filename so later locations can refer to it, for error messages etc.) Now, recall that even though we got back a single str_item, it can contain several definitions (collected with StSem). We use Ast.list_of_str_item to get an ordinary list, then accumulate tags into files.

and do_str_item si tags =
  match si with
 (* | <:str_item< let $rec:_$ $bindings$ >> -> *)
    | Ast.StVal (_, _, bindings) ->
        let bindings = Ast.list_of_binding bindings [] in
        List.fold_right do_binding bindings tags
    | _ -> tags

We'll only consider value bindings. The commented-out str_item quotation doesn't work (run it through Camlp4 to see why--I'm not sure where the extra StSem/StNil come from), so we fall back to an explicit constructor. (The rec antiquotation matches a flag controlling whether an StVal is a let rec or just a let; here we don't care.) Now we have an Ast.binding, which again can contain several bindings (collected with BiAnd) so we call Ast.list_of_bindings.

and do_binding bi tags =
  match bi with
    | <:binding@loc< $lid:lid$ = $_$ >> ->
      let line = Loc.start_line loc in
      let off = Loc.start_off loc in
      let pre = "let " ^ lid in
      (pre, lid, line, off)::tags
    | _ -> tags

We're going to generate an etags-format file, where each definition consists of a prefix of the line in the source, the tag itself, the line number, and the character offset. If you look in the parser you'll see that the left side of a binding can be any pattern (as you'd expect), but we only handle the case where it's a single identifier; the lid antiquotation extracts it as a string. The line number and character offset are easy to find from the location of the binding (see camlp4/Camlp4/Sig.ml for the Loc functions), which we get with @loc. The prefix is problematic: the location of the binding does not include the let or and part, and anyway what we really want is everything from the beginning of the line. Doable but not so instructive of Camlp4, so we just tack on a "let " prefix (so this doesn't work for and or if there is whitespace).

let print_tags files =
  let ch = open_out "TAGS" in
  ListLabels.iter files ~f:(fun (fn, tags) ->
    Printf.fprintf ch "\012\n%s,%d\n" fn 0;
    ListLabels.iter tags ~f:(fun (pre, tag, line, off) ->
      Printf.fprintf ch "%s\127%s\001%d,%d\n" pre tag line off))

Generating the tags file is straightforward, following the description at the bottom of the otags README. (The 0 is supposed to be the length of the tag data, but my Emacs doesn't seem to care.) We put the pieces together with Arg:

;;
Arg.parse [] do_fn "otags: fn1 [fn2 ...]";
print_tags !files

and finally, a Makefile:

otags: otags.ml
        ocamlc \
          -pp camlp4of \
          -o otags \
          -I +camlp4 -I +camlp4/Camlp4Parsers \
          dynlink.cma camlp4fulllib.cma otags.ml

We could improve this in many ways (error-handling, patterns, types, etc.); clearly we can't replicate otags in a few dozen lines. But Camlp4 takes care of a lot of the hard work. Next time, maybe, an actual syntax extension.

Reading Camlp4, part 3: quotations in depth

2009-01-22T14:18:00.000-08:00

(I set myself the goal of posting every week, but the latest Skydeck release has kept me busy, and it turned out I didn't understand the following as well as I thought.)

After seeing the examples of Camlp4 quotations in my last post, you may wonder:

what are all the quotations (str_item, ctyp, etc.)?
what are all the antiquotations (uid, `str, etc.)?
which antiquotations are allowed where?

To answer these questions, we're going to look at how quotations are implemented in Camlp4. We'll need to learn a little about Camlp4's extensible parsers, and look at the OCaml parser in Camlp4.

Parsing OCaml

A small complication is that there is more than one concrete syntax for OCaml in Camlp4: the original (i.e. normal OCaml syntax) and revised syntaxes. The original syntax parser is given as an extension of the revised syntax one. So we'll begin in camlp4/Camlp4Parsers/Camlp4OCamlRevisedParser.ml (line 588 in the 3.10.2 source):

    expr:
      [ "top" RIGHTA
        [ (* ... *)
        | "if"; e1 = SELF; "then"; e2 = SELF; "else"; e3 = SELF ->
            <:expr< if $e1$ then $e2$ else $e3$ >>

You can read the parser more or less as a BNF grammar. This code defines a nonterminal expr by giving a bunch of cases. The cases are grouped together into levels, which can be labeled and given an associativity (that's what "top" and NONASSOC are). Levels are used to indicate the precedence of operators, and also to provide hooks into the parser for extending it; for our purpose here you can skip over them.

You can read a case like a pattern match: match the stuff to the left of the arrow, return the stuff to the right. (What's being matched is a stream of tokens from the lexer.) A parser pattern can contain literal strings like "if", backquoted data constructors like `INT (which can carry additional data), nonterminals, and some special keywords like SELF. You can bind variables using ordinary pattern-matching syntax within token literals, and use x = y syntax to bind the result of a call to a nonterminal.

The right side is a piece of AST representing what was parsed, and in most cases it is given as a quotation. This is pretty confusing, because often the left and right sides of a case look very similar, and you can't tell what AST node is produced. However, it gives us lots of examples of tricky quotations, and since we have already seen how to expand quotations we can deal with it. (If you're curious how Camlp4 is written using itself see camlp4/boot.)

Focusing on the if case: the keywords if, then, and else are parsed with an expression after each (at least we know that's the syntax of normal OCaml, and that gives a clue to what SELF means: parse the current nonterminal); the expressions are bound to a variables; then the pieces are put together into an ExIfe AST node.

(Some other special keywords you'll see are OPT, which makes the next item optional, and LIST0/LIST1, which parse a list of items separated by the token after SEP. LIST1 means there must be at least one item.)

OCaml allows you to leave off the else part; where is the code for that? Turns out this is not allowed in revised syntax, and the original syntax overrides this part of the parser. Take a look at camlp4/Camlp4Parsers/Camlp4OCamlParser.ml (line 292):

    expr: LEVEL "top"
      [ [ (* ... *)
        | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top";
          "else"; e3 = expr LEVEL "top" ->
            <:expr< if $e1$ then $e2$ else $e3$ >>
        | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top" ->
            <:expr< if $e1$ then $e2$ else () >>

(Notice how the expr definition is qualified with the level in the revised grammar where it should slot in.)

Quotations and antiquotations

Hopefully that is enough about parsing to muddle through; let's move on to quotations. Here's another piece of the revised parser (line 670)--these are still cases of expr:

  [ `QUOTATION x -> Quotation.expand _loc x Quotation.DynAst.expr_tag

The `QUOTATION token contains a record including the body of the quotation and the tag. The record is passed off to the Quotation module to be expanded. The actual expansion happens in camlp4/Camlp4Parsers/Camlp4QuotationCommon.ml. Looking to the bottom of that file, there are several lines like:

  add_quotation "sig_item" sig_item_quot ME.meta_sig_item MP.meta_sig_item;

This installs a quotation expander for the sig_item tag. The expander parses the quotation starting at the sig_item_quot nonterminal in the parser, then runs the result through the antiquotation expander (see below). (The last two arguments to add_quotation have to do with the context where a quotation appears: inside a pattern you get PaFoo nodes while inside an expression you get ExBar nodes.) So we can answer one of the questions posed at the beginning: what are all the quotation tags? We can see here that there is a quotation for each type in camlp4/Camlp4/Camlp4Ast.partial.ml.

Now let's look at antiquotations, which are more complicated (line 671):

        | `ANTIQUOT ("exp"|""|"anti" as n) s ->
            <:expr< $anti:mk_anti ~c:"expr" n s$ >>

The `ANTIQUOT token contains the tag and the body again (and the parser can choose a case based on the tag). The anti antiquotation creates a special AST node to hold the body of the antiquotation; each type in the AST has a constructor (ExAnt, TyAnt, etc.) for this purpose. The mk_anti function adds another tag, which is not always the same as the one we parsed; the ~c argument adds a suffix giving the context where the antiquotation appeared.

There are two places where antiquotations are interpreted. First, in camlp4/Camlp4Parsers/Camlp4QuotationCommon.ml (line 89):

            [ "`int" -> <:expr< string_of_int $e$ >>

This is one of a bunch of cases in a map over the syntax tree. It handles antiquotations like <:expr< $`int:5$ >>, which turns into an ExInt. You can also see cases here for the anti antiquotations, and some things to do with list antiquotations we haven't seen yet (more on this below).

Things that don't match these cases are handled when the AST is pretty-printed. Let's look at camlp4/Camlp4/Printers/OCaml.ml (line 510):

    | <:expr< $int:s$ >> -> o#numeric f s ""

This case handles antiquotations like <:expr< $int:"5"$ >>. Again, this produces an ExInt, but you give it a string instead of an int.

What we have learned

Teaching a person to fish is fine, unless that person starves while trying to finish their PhD in theoretical pescatology. But I hope that you can see how we might go about answering the remaining questions--what are all the antiquotations, and where are they allowed--by examining all the `ANTIQUOT cases in the parser and puzzling out where they get expanded.

Let's look at a particular example, by way of addressing the comment Nicolas Pouillard (aka Ertai) made on the last post. He points out that the final McOr in of_string can go outside the antiquotation. How could we learn this from the Camlp4 code? Let's find where the antiquotation is expanded, starting at the point where the function keyword is parsed (Camlp4OCamlParser.ml line 299):

  | "function"; a = match_case ->
      <:expr< fun [ $a$ ] >>

(the right side is revised syntax) which uses match_case (line 350):

    match_case:
      [ [ OPT "|"; l = LIST1 match_case0 SEP "|" -> Ast.mcOr_of_list l ] ]

You might think that match_case0 parses a single case, but let's check (Camlp4OCamlRevisedParser.ml line 778):

    match_case0:
      [ [ `ANTIQUOT ("match_case"|"list" as n) s ->
            <:match_case< $anti:mk_anti ~c:"match_case" n s$ >>
        | `ANTIQUOT (""|"anti" as n) s ->
            <:match_case< $anti:mk_anti ~c:"match_case" n s$ >>

We're interested in the second case for the moment: here's the antiquotation with no tag used in of_string. So the list of cases is returned by match_case0 (as an McAnt with match_case as its tag) and more cases can be parsed following it.

(Now we can see a justification for a puzzling design decision in the AST: instead of collecting match cases in a list, it collects them with McOr nodes. Many arrangements of McOr nodes correspond to the same list of cases. As the above possibility shows, this is useful: an antiquotation can return zero, one, or several match cases, and we don't have to worry about splicing them into the list. On the other hand, it makes consuming the AST a little more complicated.)

We can go one step further: if we use the list antiquotation, the first case in match_case0 returns an antiquotation with tag listmatch_case, and we get the following expansion (Camlp4QuotationCommon.ml line 117):

            | "listmatch_case" -> <:expr< Ast.mcOr_of_list $e$ >>

So our final of_string becomes:

let of_string = function
    $list:
      List.map
        (fun c -> <:match_case< $`str:c$ -> $uid:c$ >>)
        cons$
  | _ -> invalid_arg "bad string"

Can we do something similar with the generation of the variant type? No, as it turns out. In the revised syntax, the arms of a variant are given inside square brackets, so we can say:

type t = [ $list:List.map (fun c -> <:ctyp< $uid:c$ >>) cons$ ]

But in the original syntax, without at least one constructor to make clear that we're defining a variant, there's no context to interpret a list, and this is reflected in the parser, which doesn't allow a list antiquotation there. This kind of problem is apparently why the revised syntax was introduced.

So far I've talked only about generating OCaml code; next time I'll cover how to use Camlp4 to consume OCaml, and build a simple code analysis tool.