Message-ID: <19990902010939.12962@pauillac.inria.fr> Date: Thu, 2 Sep 1999 01:09:39 +0200 From: Jerome Vouillon To: chet@watson.ibm.com, Jerome Vouillon Cc: caml-list@inria.fr, blo.b@infonie.fr Subject: Re: Efficency in OCaml On Wed, Sep 01, 1999 at 02:40:21PM -0400, chet@watson.ibm.com wrote: > Is there a description of the Ocaml object and > "virtual-function-table" format? I have not written any description but I can explain it rapidly here. Objective Caml uses a scheme which is also used in the Gnu Objective C compiler. Each object holds a table containing its methods (the table is shared among all objets of a same class). A unique integer is assigned at run-time to each method name of a program. This integer is used as an index in this table to find the method. As the table is rather sparse, it is implemented as a two-level array (an array of arrays of functions). So, a method call "object#m e1 ... en" is compile in something that looks like "object.(0).(idx mod N).(idx / N) objet e1 ... en" where idx is the integer associated to the method name "m". This scheme is very flexible and rather efficient. It would be hard to improve it actually. Indeed, one will always need one indirection to know what class the object belongs to and another indirection to retrieve the method. So one could only gain one indirection (the arithmetics is more or less free, as it can be done in parallel). Actually, there is yet another indirection, not shown here, to retrieve a pointer to the code from the method closure. This indirection could probably be optimize a bit (by doing it in parallel with the retrieval of the closure). A great part of the cost of a method call also comes from the fact that calling an unknown function is more expansive than calling a know function : there is a check to see if the function is called with less arguments than it needs, more arguments or exactly the right number of arguments. The only way to optimize this would be to make a global analyze of the program during the compilation. This analyze would determine whether at a given method invocation location "x#m" all methods that could possibly be called are called with the right number of arguments, and would also check whether the method invocation could be replaced by a function call (that is, whether only one method can be called from this location). In order to provide more opportunities for optimization, one would also probably need to do some partial evaluation. Implementing all this would require a lot of work... > Also, well, I think there's been some recent work on analyzing the > path-length of O-O code, and the conclusion has been that in fact, > methods do just "call one other" a lot. > > That is, while C code is characterized by lots of tests and longer > path-lengths per function-body, C++ (and Java, *especially* -- geez, > it seems like Java code is all method-calls!) code tends to be short > code-bodies, with branches implemented effectively by calling virtual > functions. OCaml is first a functional langage so I think (I hope) that people do not use objects as heavily as in Java, and for instance use pattern matching rather than only method calls for implementing branches. -- Jérôme Date: Sat, 4 Sep 1999 17:26:33 +0300 (EET DST) From: Nicolas Ollinger To: Jerome Vouillon cc: caml-list@inria.fr Subject: Re: Efficency in OCaml Message-ID: On Thu, 2 Sep 1999, Jerome Vouillon wrote: > On Wed, Sep 01, 1999 at 02:40:21PM -0400, chet@watson.ibm.com wrote: > > Is there a description of the Ocaml object and > > "virtual-function-table" format? (snip description of buckets use) I played a little with objects representation in OCaml 2.xx. As far as I understand, at least in bytecode, class instances are represented as boxed values tagged object_tag with n+2 fields : then first field is the method array array described by Jerome in last mail, the second field seems to be a unique id associate to the object, other fields are used for instance variables in the order of declaration, inherited variables first. As the method array array is unique for each class, it can be used to identify the class (notice that classes are represented as global variables). I'm intrigued by this second field, what is the use of this id ? Where is the necessity to identify uniquely every object ? Concerning marshaling of objects, a simple solution is to use a function like: let crunch o = let r = Obj.dup (Obj.repr o) in let idclass = compute_id (Obj.field r 0) in Obj.set_field r 1 idclass; Obj.set_tag r Obj.marshaled_object_tag r;; with compute_id a function that deduce a unique class id of the object. Then unmarshaling is just doing the inverse operation. Of course, if you want to share objects between different programs then you must add some informations about the module in which the class is declared, and so one. Any comment ? N. -- Message-ID: <19990915143924.26118@pauillac.inria.fr> Date: Wed, 15 Sep 1999 14:39:24 +0200 From: Jerome Vouillon To: Hendrik Tews , caml-list@inria.fr Subject: Re: Efficency in OCaml On Fri, Sep 10, 1999 at 05:19:39PM +0200, Hendrik Tews wrote: > Each object holds a table containing its methods (the table is shared > among all objets of a same class). A unique integer is assigned at > run-time to each method name of a program. This integer is used as an > index in this table to find the method. As the table is rather sparse, > it is implemented as a two-level array (an array of arrays of > functions). So, a method call > "object#m e1 ... en" > is compile in something that looks like > "object.(0).(idx mod N).(idx / N) objet e1 ... en" > where idx is the integer associated to the method name "m". > > Sorry, I don't understand this. How can the compiler know idx, if > it is not known until run-time? idx is a variable which is bound at run-time at the beginning of the toplevel module containing the method invocation. -- Jérôme