In the ocamljs project I wanted to implement the OCaml object system in a way that is interoperable with Javascript objects. Mainly I wanted to be able to call Javascript methods with the OCaml method call syntax, but it is also useful to write objects in OCaml which are callable in the usual way from Javascript.
I spent some time a few months ago figuring out how OCaml objects are put together in order to implement this (it is in the unreleased ocamljs
trunk—new release coming soon I hope). I got a bug report against it the other day, and it turns out I don’t remember much of what I figured out. So I am going to figure it out again, and write it down, here in this very blog post!
Objects are implemented mostly in the CamlinternalOO
library module, with a few compiler primitives for method invocation. The compiler generates CamlinternalOO
calls to construct classes and objects. Our main tool for figuring out what is going on is to write a test program, dump out its lambda code with -dlambda
, and read the CamlinternalOO
source to see what it means. I will explain functions from camlinternalOO.ml but not embed them in the post, so you may want it available for reference.
I have hand-translated (apologies for any errors) the lambda code back to pseudo-OCaml to make it more readable. The compiler-generated code works directly with the OCaml heap representation, and generally doesn’t fit into the OCaml type system. Where the heap representation can be translated back to an OCaml value I do that; otherwise I write blocks with array notation, and atoms with integers. Finally I have used OO
as an abbreviation for CamlinternalOO
.
Here is a first test program, defining an immediate object:
let p =
object
val mutable x = 0
method get_x = x
method move d = x <- x + d
end
And this is what it compiles to:
let shared = [|"move";"get_x"|]
let p =
let clas = OO.create_table shared in
let obj_init =
let ids = OO.new_methods_variables clas shared [|"x"|] in
let move = ids.(0) in
let get_x = ids.(1) in
let x = ids.(2) in
OO.set_methods clas [|
get_x; OO.GetVar; x;
move; (fun self d -> self.(x) <- self.(x) + d);
|];
(fun env ->
let self = OO.create_object_opt 0 clas in
self.(x) <- 0;
self) in
OO.init_class clas;
obj_init 0
An object has a class, created with create_table
and filled in with new_methods_variables
, set_methods
, and init_class
; the object itself is created by calling create_object_opt
with the class as argument, then initializing the instance variable.
A table (the value representing a class) has the following fields (and some others we won’t cover):
type table = {
mutable size: int;
mutable methods: closure array;
mutable methods_by_name: meths;
mutable methods_by_label: labs;
mutable vars: vars;
}
Each instance variable has a slot (its index in the block which represents the object); vars
maps variable names to slots. The size
field records the total number of slots (including internal slots, see below).
Each public method has a label, computed by hashing the method name. The methods
field (used for method dispatch) holds each method of the class, with the label of the method at the following index (the type is misleading). Each method then has a slot (the index in methods
of the method function); methods_by_name
maps method names to slots, and the confusingly-named methods_by_label
marks slots to whether it is occupied by a public method.
The create_table
call assigns slots to methods, fills in the method labels in methods
, and sets up methods_by_name
and methods_by_label
. The new_methods_variables
call returns the slot of each public method and each instance variable in a block (which is unpacked into local variables).
The set_methods
call sets up the method functions in methods
. Its argument is a block containing alternating method slots and method descriptions (the description can take more than one item in the block). For some methods (e.g. move
) the description is just an OCaml function (here you can see that self
is passed as the first argument). For some the description is given by a value of the variant OO.impl
along with some other arguments. For get_x
it is GetVar
followed by the slot for x
. The actual function that gets the instance variable is generated by set_methods
. As far as I understand it, the point of this is to reduce object code size by factoring out the common code from frequently occurring methods.
Finally create_object_opt
allocates a block of clas.size
, then fills in the first slot with the methods
array of the class and the second with the object’s unique ID. (We will see below what the _opt
part is about.)
A public method call:
p#get_x
compiles to:
send p 291546447
where send
is a built-in lambda term. The number is the method label. To understand how the method is applied we have to go a little deeper. In bytegen.ml
there is a case for Lsend
which generates the Kgetpubmet
bytecode instruction to find the method function; the function is then applied like any other function. Next we look to the GETPUBMET
case in interp.c
to see how public methods are looked up in the methods
block (stored in the first field of the object).
A couple details about methods
we didn’t cover before: The first field contains the number of public methods. The second contains a bitmask used for method caching—briefly, it is enough bits to store offsets into methods
. The rest of the block is method functions and labels as above, padded out so that the range of an offset masked by the bitmask does not overflow the block.
Returning to GETPUBMET
, we first check to see if the method cache for this call site is valid. The method cache is an extra word at each call site which stores an offset into methods
(but may be garbage—masking it takes care of this). If the method label at this offset matches the label we’re looking for, the associated method function is returned. Otherwise, we binary search methods
to find the method label (methods are sorted in label order in transclass.ml
), then store the offset in the cache and return the associated method function.
A class definition:
class point =
object
val mutable x = 0
method get_x = x
method move d = x <- x + d
end
let p = new point
compiles to:
let shared = [|"move";"get_x"|]
let point =
let point_init clas =
let ids = OO.new_methods_variables clas shared [|"x"|] in
let move = ids.(0) in
let get_x = ids.(1) in
let x = ids.(2) in
OO.set_methods clas [|
get_x; OO.GetVar; x;
move; (fun self d -> self.(x) <- self.(x) + d);
|];
(fun env self ->
let self = OO.create_object_opt self clas in
self.(x) <- 0;
self) in
OO.make_class shared point_init
let p = (point.(0) 0)
This is similar to the immediate object code, except that the class constructor takes the class table as an argument rather than constructing it itself, and the object constructor takes self
as an argument. We will see that class and object constructors are each chained up the inheritance hierarchy, and the tables / objects are passed up the chain. The make_class
call calls create_table
and init_class
in the same way we saw in the immediate object case, and returns a tuple, of which the first component is the object constructor. So the new
invocation calls the constructor.
A subclass definition:
class point = ... (* as before *)
class point_sub =
object
inherit point
val mutable y = 0
method get_y = y
end
compiles to:
let point = ... (* as before *)
let point_sub =
let point_sub_init clas =
let ids =
OO.new_methods_variables clas [|"get_y"|] [|"y"|] in
let get_y = ids.(0) in
let y = ids.(1) in
let inh =
OO.inherits
clas [|"x"|] [||] [|"get_x";"move"|] point true in
let obj_init = inh.(0) in
OO.set_methods clas [| get_y; GetVar; y |];
(fun env self ->
let self' = OO.create_object_opt self clas in
obj_init self';
self'.(y) <- 0;
OO.run_initializers_opt self self' clas) in
OO.make_class [|"move";"get_x";"get_y"|] point_sub_init
The subclass is connected to its superclass through inherits
, which calls the superclass constructor on the subclass (filling in methods
with the superclass methods) and returns the superclass object constructor (and some other stuff). In the subclass object constructor, the superclass object constructor is called. (This is why the object is created optionally—the class on which new
is invoked actually allocates the object; further superclass constructors just initialize instance variables.) In addition, we run any initializers, since some superclass may have them.
A class with a self-call:
class point =
object (s)
val mutable x = 0
method get_x = x
method get_x5 = s#get_x + 5
end
becomes:
let point =
let point_init clas =
... (* as before *)
OO.set_methods clas [|
get_x; OO.GetVar; x;
get_x5; (fun self -> sendself self get_x + 5);
|]
...
Here sendself
is a form of Lsend
for self-calls, where we know the method slot at compile time. Instead of generating the Kgetpubmet
bytecode, it generates Kgetmethod
, which just does an array reference to find the method.
A class with a super-call:
class point = ... (* as before *)
class point_sub =
object
inherit point as super
method move1 n = super#move n
end
becomes:
let point = ... (* as before *)
let point_sub =
let point_sub_init clas =
...
let inh =
OO.inherits
clas [|"x"|] [||] [|"get_x";"move"|] point true in
let move = inh.(3) in
...
OO.set_methods clas [|
move1; (fun self n -> move self n)
|];
...
In this case, we are able to look up the actual function for the super-call in the class constructor (returned from inherits
), so the invocation is just a function application rather than a slot dereference.
I don’t totally understand why we don’t know the function for self calls. I think it is because the superclass constructor runs before the subclass constructor, so the slot is assigned (this happens before the class constructors are called) but the function hasn’t been filled in yet. Still it seems like the knot could somehow be tied at class construction time to avoid a runtime slot dereference.
ocamljs implementationThe main design goal is that we be able to call methods on ordinary Javascript objects with the OCaml method call syntax, simply by declaring a class type giving the signature of the object. So if you want to work with the browser DOM you can say:
class type document =
object
method getElementById : string -> #element
... (* and so on *)
end
for some appropriate element
type (see src/dom/dom.mli
in ocamljs
for a full definition), and say:
document#getElementById "content"
to make the call.
These are always public method calls, so they use the Lsend
lambda form. We don’t want to do method label dispatch, since Javascript already has dispatch by name, so first off we need to carry the name rather than the label in Lsend
.
We have seen how self
is passed as the first argument when methods are invoked. We can’t do that for an arbitrary Javascript function, but the function might use this
, so we need to be sure that this
points to the object.
There is no way to know at compile time whether a particular method invocation is on a regular Javascript object or an OCaml object. Maybe we could mark OCaml objects somehow and do a check at runtime, but I decided to stick with a single calling convention. So whatever OCaml objects compile to, they have to support the convention for regular Javascript objects—foo#bar
compiles to foo.bar
, with this
set to foo
.
As we have seen, self-calls are compiled to a slot lookup rather than a name lookup, so we also need to support indexing into methods
.
So here’s the design: an OCaml object is represented by a Javascript object, with numbered slots containing the instance variables. There is a constructor for each class, with prototype
set up so each method is accessible by name, and the whole methods
block is accessible in a special field, so we can call by slot. (Since we don’t need method labels, methods
just holds functions.)
The calling convention passes self
in this
, so we bind a local self
variable to this
on entry to each method. It doesn’t work to say this
everywhere instead of self
, because this
in Javascript is a bit fragile. In particular, if you define and apply a local function (ocamljs
does this frequently), this
is null rather than the lexically-visible this
.
For sendself
we look up the function by slot in the special methods field. Finally, for super-calls, we know the function at class construction time. In this case the function is applied directly, but we need to take care to treat it as a method application rather than an ordinary function call, since the calling convention is different.
The OCaml compiler turns super-calls into function applications very early in compilation (during typechecking in typecore.ml
). There is no difference in calling convention for regular OCaml, so it doesn’t matter that later phases don’t know that these function applications are super-calls. But in our case we have to carry this information forward to the point where we generate Javascript (in jsgen.ml
). It is a little tricky without changing the “typedtree” intermediate language.
I had put in a hack to mark these applications with a special extra argument, and it worked fine for my test program, where the method had no arguments. I didn’t think through or test the case where the method has arguments though. I was able to fix it (I think!) with a different hack: super calls are compiled to self calls (that is, to Texp_send
with Tmeth_val
) but the identifier in Tmeth_val
is marked with an unused bit to indicate that it binds a function rather than a slot, so we don’t need to dereference it.
Appendix: other features
It is interesting to see how the various features of the object system are implemented, but maybe not that interesting, so here they are as an appendix.
Constructor parametersA class definition with a constructor parameter:
class point x_init =
object
val mutable x = x_init
... (* as before *)
end
compiles to:
let point =
let point_init clas =
... (* as before *)
(fun env self x_init ->
OO.create_object_opt self clas;
self.(x) <- x_init;
self) in
... (* as before *)
So the constructor parameter in the surface syntax just turns into a constructor parameter internally. (There is a slightly funny interaction between constructor parameters and let
-bound expressions after class
but before object
: if there is no constructor parameter the let
is evaluated at class construction, but if there is a parameter it is evaluated at object construction, whether or not it depends on the parameter.)
A class definition with a virtual method:
class virtual abs_point =
object
method virtual move : int -> unit
end
compiles to:
let abs_point = [|
0;
(fun clas ->
let move = OO.get_method_label clas "move" in
(fun env self ->
OO.create_object_opt self clas);
0; 0
|]
Since a virtual class can’t be instantiated, there’s no need to create the class table with make_class
; we just return the tuple that represents the class, containing the class and object constructor. (I don’t understand the call to get_method_label
, since its value is unused; possibly it is called for its side effect, which is to register the method in the class table if it does not already exist.)
A subclass implementing the virtual method inherits from the virtual class in the usual way.
A class declaration with a virtual instance variable:
class virtual abs_point2 =
object
val mutable virtual x : int
method move d = x <- x + d
end
becomes:
let abs_point = [|
0;
(fun clas
let ids =
OO.new_methods_variables [|"move"|] [|"x"|] in
... (* as before *));
0; 0
|]
Again, a subclass providing the instance variable inherits from the virtual class in the usual way. By the time new_methods_variables
is called in the superclass, the subclass has registered a slot for the variable.
A class definition with a private method:
class point =
object
val mutable x = 0
method get_x = x
method private move d = x <- x + d
end
compiles to:
let point =
let point_init clas =
let ids =
OO.new_methods_variables clas [|"move";"get_x"|] [|"x"|] in
... (* as before *)
OO.make_class [|"get_x"|] point_init
Everything is the same except that the private method is not listed in the public methods of the class. Since a private method is callable only from code in which the class of the object is statically known, there is no need for dispatch or a method label. The private method functions are stored in methods
after the public methods and method labels.
If we expose a private method in a subclass:
class point = ... (* as before *)
class point_sub =
object
inherit point
method virtual move : _
end
we get:
let point = ... (* as before *)
let point_sub =
let point_sub_init clas = ... (* as before *)
OO.make_class [|"move";"get_x"|] point_sub_init
Putting "move"
in the call to make_class
registers it as a public method, so later, when set_method
is called for move
in the superclass constructor, it puts the method and its label in methods
for dispatch.
Very interesting post, as always! I believe that calls to self are indirect because the method can be overridden in a subclass. In other words, self calls are dispatched based on the dynamic type of the object, while super calls are dispatched based on the static superclass of the class.
ReplyDeleteThat made sense to me too, but: since the methods block has methods sorted by label, two subclasses may have the same method at different slots (depending on what other public methods they add). So the slot is not known at compile time.
ReplyDeleteIn fact, the class table is constructed anew for each subclass. Slots are assigned first (in create_table), then methods are filled in by the class constructors (looking up slots with new_methods_variables). The method bodies of the superclass are closed over whatever slots are correct for the particular subclass.
Here's another explanation: backpatching the actual function once it is known must use an indirection at runtime (well, short of patching the code itself), so the index into the methods block is not more expensive, and is simpler.