There's a not that well known trick to implement heterogeneous, type-safe containers in OCaml (and friends: MLton uses them to decorate ASTs in different passes), known as property lists. Every once in a while I see people asking for how to do such a thing, so I'll show here a reasonably efficient implementation of mine here in order to refer to it later.
Property lists work very much like hash tables, allowing you to associate a value to a key. Unlike regular hash tables, though, they allow you to associate values of arbitrary types, possibly of different types for each key.
The key idea is to turn the key into a value that holds the type of the corresponding value. The signature of a property list is
module type PROPERTY_LIST = sig type t val create : unit -> t val new_property : unit -> (t -> 'a -> unit) * (t -> 'a option) end
A property is but a pair of functions which when given a property list either set or retrieve the property value. (The get function returns an option type because the value might not have been set.)
A module P with that signature would be used this way:
let get t (set, get) = get t let set t (set, get) x = set t x let name = P.new_property let age = P.new_property let table = P.create () (* ... *) set table name "J.R. Hacker" (* ... *) print_endline (match get table name with None -> "<anon>" | Some x -> x)
Implementation
There are two ways to implement property lists: by using polymorphic exceptions, and via a reference (this is not thread-safe, but can be easily made so). The former are not available in OCaml, so we have to do with the latter. Without further ado, here's my implementation, which differs from the others you'll find in the ocaml-list archives in that set/get are constant-time operations, regardless of the number of elements in the property list:
module MappedList : PROPERTY_LIST =
struct
type t = (int, unit -> unit) Hashtbl.t
let create () = Hashtbl.create 13
let new_id : unit -> int =
let id = ref 0 in
fun () -> incr id; !id
let new_property () =
let id = new_id () in
let v = ref None in
let set t x =
Hashtbl.replace t id (fun () -> v := Some x) in
let get t =
try
(Hashtbl.find t id) ();
match !v with
Some x as s -> v := None; s
| None -> None
with Not_found -> None
in
(set, get)
end
If you don't see why the above works, this "classic" (and inefficient, as said above) implementation might be easier to understand:
module Basic : PROPERTY_LIST =
struct
type t = { mutable funcs : (unit -> unit) list; }
let create () = { funcs = [] }
let new_property () =
let v = ref None in
let set t x =
t.funcs <- (fun () -> v := Some x) :: t.funcs in
let get t =
let rec aux = function
[] -> None
| f::tl ->
f ();
match !v with
None -> aux tl
| Some x -> v := None; Some x
in aux t.funcs
in
(set, get)
end
Comments
In the Hashtbl version, is the "v := None;" in "get" really useful ?
You're right, it doesn't really serve any purpose, since
(Hashtbl.find t id) ()will overwrite the ref unconditionally withSome _before its value is matched against.There is yet another trick, using weak maps, where the roles of key and values are reversed. The property is the map and the 'property list' is the key which is used to store/retrieve the value.
The trick is that the type is associated not with the container but with the key/property.
Now I see that you seem to use Markdown for the comments. A warning would have been appropriate.
Since you're using Weak, isn't it possible for the values to disappear when they're collected by the GC (if there are no other references to them, that is)?
Sorry for the formatting troubles, I'm in fact using a simplified Markdown I have yet to document. Code blocks are done with
{{ type property_list = int; ... }}I've fixed the formatting of your previous comment.
hmm
Weak.set's internal implementation works similarly tocaml_modify's remembered set handling: it essentially maintains a list of roots that is traversed on each minor GC run. Therefore, it seems that large or numerous property lists could make the GC (and hence the whole program) much slower.Now, that I'm thinking about it, the Weak part is optional, but may be handy.
I've done a similar thing in Java, which I use at work, and the reason of picking the Weak maps was that I did want the properties to shrink when the "containers" are gone.
Well, I guess this has at least some entertainment value.
There is something puzzling me. It seems to me you must hold a pointer to every entry. It also seems to me you can't traverse or iterate the property list. So i can't understand how you benefit this 'container'.
Vasile, do you have any pointer on the performance characteristics of Java's WeakHashMap? Does it put the GC under pressure?
SpiceGuid: you hold a pointer to every property, which can be used in different property lists. The canonical example is MLton's use: you have a tree whose nodes must be decorated in multiple passes. Instead of making the nodes polymorphic over the N types used in the different phases, hardcoding them in a record or using a tuple (which means that all the passes break whenever you add a field), you can just attach a property list to each node. That is, the property list lets you code the different passes independently without them having to know about each other's annotations (and their types).
You can find here an explanation of the problem property lists solve in the context of an XML tree.
Ok, i now understand, what you want to implement is not a container but a monomorphic universal type (to be used as the contents of a structured data).
Your implementation should not be concerned what the actual container is.
Here is how to do that, code by Alain Frisch from Yaron Minsky blog :
module Univ = struct type t = (unit -> unit) * (unit -> unit) let embed () = let r = ref None in let put x = (fun () -> r := Some x), (fun () -> r := None) in let get (f, g) = f (); let res = !r in g (); res in put, get endNow you can make a Univ.t list or any other structured container.