There's a not that well known trick to implement heterogeneous, type-safe containers in OCaml (and friends: MLton uses them to decorate ASTs in different passes), known as property lists. Every once in a while I see people asking for how to do such a thing, so I'll show here a reasonably efficient implementation of mine here in order to refer to it later.

Property lists work very much like hash tables, allowing you to associate a value to a key. Unlike regular hash tables, though, they allow you to associate values of arbitrary types, possibly of different types for each key.

The key idea is to turn the key into a value that holds the type of the corresponding value. The signature of a property list is

module type PROPERTY_LIST =
sig
  type t

  val create : unit -> t
  val new_property : unit -> (t -> 'a -> unit) * (t -> 'a option)
end

A property is but a pair of functions which when given a property list either set or retrieve the property value. (The get function returns an option type because the value might not have been set.)

A module P with that signature would be used this way:

let get t (set, get) = get t
let set t (set, get) x = set t x

let name = P.new_property
let age = P.new_property

let table = P.create ()

(* ... *)

set table name "J.R. Hacker"

(* ... *)

print_endline (match get table name with None -> "<anon>" | Some x -> x)

Implementation

There are two ways to implement property lists: by using polymorphic exceptions, and via a reference (this is not thread-safe, but can be easily made so). The former are not available in OCaml, so we have to do with the latter. Without further ado, here's my implementation, which differs from the others you'll find in the ocaml-list archives in that set/get are constant-time operations, regardless of the number of elements in the property list:

module MappedList : PROPERTY_LIST =
struct
  type t = (int, unit -> unit) Hashtbl.t

  let create () = Hashtbl.create 13

  let new_id : unit -> int =
    let id = ref 0 in
      fun () -> incr id; !id

  let new_property () =
    let id = new_id () in
    let v = ref None in

    let set t x =
      Hashtbl.replace t id (fun () -> v := Some x) in

    let get t =
      try
        (Hashtbl.find t id) ();
        match !v with
            Some x as s -> v := None; s
          | None -> None
      with Not_found -> None
    in
      (set, get)
end

If you don't see why the above works, this "classic" (and inefficient, as said above) implementation might be easier to understand:

module Basic : PROPERTY_LIST =
struct
  type t = { mutable funcs : (unit -> unit) list; }
  let create () = { funcs = [] }

  let new_property () =
    let v = ref None in

    let set t x =
      t.funcs <- (fun () -> v := Some x) :: t.funcs in

    let get t =
      let rec aux = function
          [] -> None
        | f::tl ->
            f ();
            match !v with
                None -> aux tl
              | Some x -> v := None; Some x
      in aux t.funcs
    in
      (set, get)
end

Comments

  1. In the Hashtbl version, is the "v := None;" in "get" really useful ?

    bluestorm, 13 February 2009 at 14:57#
  2. You're right, it doesn't really serve any purpose, since (Hashtbl.find t id) () will overwrite the ref unconditionally with Some _ before its value is matched against.

    mfp, 13 February 2009 at 15:21#
  3. There is yet another trick, using weak maps, where the roles of key and values are reversed. The property is the map and the 'property list' is the key which is used to store/retrieve the value.

    type property_list = int;
    let create () = .. (* returns a new unique int *)
    let Prop = Weak.Make ( .. )
    
    let get pl property = Prop.find property pl
    let set pl property value = Prop.add property pl value
    

    The trick is that the type is associated not with the container but with the key/property.

    Vasile, 14 February 2009 at 09:42#
  4. Now I see that you seem to use Markdown for the comments. A warning would have been appropriate.

    Vasile, 14 February 2009 at 09:44#
  5. Since you're using Weak, isn't it possible for the values to disappear when they're collected by the GC (if there are no other references to them, that is)?

    Sorry for the formatting troubles, I'm in fact using a simplified Markdown I have yet to document. Code blocks are done with

    {{
      type property_list = int;
      ...
    }}
    

    I've fixed the formatting of your previous comment.

    mfp, 14 February 2009 at 11:25#
  6. hmm Weak.set's internal implementation works similarly to caml_modify's remembered set handling: it essentially maintains a list of roots that is traversed on each minor GC run. Therefore, it seems that large or numerous property lists could make the GC (and hence the whole program) much slower.

    mfp, 14 February 2009 at 11:36#
  7. Now, that I'm thinking about it, the Weak part is optional, but may be handy.

    I've done a similar thing in Java, which I use at work, and the reason of picking the Weak maps was that I did want the properties to shrink when the "containers" are gone.

    Well, I guess this has at least some entertainment value.

    Vasile, 14 February 2009 at 12:30#
  8. There is something puzzling me. It seems to me you must hold a pointer to every entry. It also seems to me you can't traverse or iterate the property list. So i can't understand how you benefit this 'container'.

    SpiceGuid, 14 February 2009 at 20:38#
  9. Vasile, do you have any pointer on the performance characteristics of Java's WeakHashMap? Does it put the GC under pressure?

    SpiceGuid: you hold a pointer to every property, which can be used in different property lists. The canonical example is MLton's use: you have a tree whose nodes must be decorated in multiple passes. Instead of making the nodes polymorphic over the N types used in the different phases, hardcoding them in a record or using a tuple (which means that all the passes break whenever you add a field), you can just attach a property list to each node. That is, the property list lets you code the different passes independently without them having to know about each other's annotations (and their types).

    You can find here an explanation of the problem property lists solve in the context of an XML tree.

    mfp, 15 February 2009 at 12:00#
  10. Ok, i now understand, what you want to implement is not a container but a monomorphic universal type (to be used as the contents of a structured data).

    Your implementation should not be concerned what the actual container is.

    Here is how to do that, code by Alain Frisch from Yaron Minsky blog :

    module Univ = struct
      type t = (unit -> unit) * (unit -> unit)
     
      let embed () =
        let r = ref None in
        let put x = (fun () -> r := Some x), (fun () -> r := None) in
        let get (f, g) = f (); let res = !r in g (); res in
        put, get
    end
    

    Now you can make a Univ.t list or any other structured container.

    SpiceGuid, 15 February 2009 at 14:42#