Matías Giovannini has been implementing purely functional data structures based on Bentley and Sedgewick's ternary search trees (TST) and pitching them against OCaml's imperative hash table (Hashtbl). In the interval between the initial results and his showing his first version (called Trie_map from now on), I made a functional TST ("Ternary") which compares favorably to both Trie_map and the revised implementation ("Trie_map_mod").

I also resuscitated a hash table implementation of mine that uses open addressing with double hashing (in contrast to Hashtbl's external chaining) and exhibits vastly improved performance.

I'll show the benchmark results first, to be followed by some code and analysis.

Benchmarks

I benchmark the following finite map implementations:

All the finite maps are compared by

  1. building a map with 98568 English words, which are added in (a) randomized and (b) lexicographic order

  2. measuring the minimal lookup cost with constant lookups (so that everything is in the L1 cache)

  3. timing successful lookups by looking for the 98568 words

    1. in the same (random) order the elements were added

    2. in randomized order

    3. in lexicographic order

    4. in reverse lexicographic order

    ... against the maps built in randomized and lexicographic order

  4. timing searches against the previously generated maps (randomized and lexicographic order) using

    1. a larger set of English words (217625, 45.3% hit rate)

    2. a set of Spanish words (86016 words, 1.3% hit rate)

    ... in randomized and lexicographic order

For the sake of exhaustiveness, I measure the iteration and functor overhead and detract it from the lookup timings. The total size of the structures (including the size of the strings) is also measured. All the measurements are repeated 20 times, and the best time is retained. The benchmarks are performed in a single program execution to achieve steady state (major heap resized to required size). The set of words used in the map is large enough to ensure the structure (taking 5.7MB to over 12MB) doesn't fit in the 2MB L2 cache.

The full results can be found here (also includes hash table benchmark with integer keys).

This table shows the results for a map built in randomized order (refer to the full output for those in lexicographic order). Speed is measured in ops/sec, and size in bytes ("Find (hit, orig)" means that the lookups are performed in the same random order as the insertions).

Insertion (randomized order) and lookup

Data structure Size Load factor Insertion (rand) Find (constant) Find (hit, rand) Find (hit, orig) Find (hit, sorted) Find (45.3%, rand) Find (1.3%, rand) Find (1.3% sorted)
FastHashtbl 7325544 0.376 1509558 16701222 2669067 3554728 3958575 2347248 3243197 5133702
Hashtbl 5752664 1105131 5617227 2136562 2296294 2791035 1623434 1731465 2229713
Hashtbl_mod 7325528 0.376 639994 7373058 2711193 2963744 3893034 2249186 2926475 4403349
Hashtbl_hval 8114072 0.376 675892 10457458 2769387 3056772 4153404 2435018 3436621 5542555
Ternary 11894440 179599 14739138 1484213 1556592 4576735 1567019 2419841 6466320
Trie_map 99834 11870073 985476 1024257 3022102 1085412 1797166 5763597
Trie_map_mod 12417768 164725 8476101 1284442 1340733 3246878 1417483 2151898 4811496
Map 6805440 167046 1168963 674209 671368 1026645 556788 633682 986727

Implementation notes

The source code for all the containers and the benchmarks can be obtained here.

I modified Hashtbl first by changing its resizing policy so that the load factor is the same as in Fasthashtbl (Hashtbl_mod), and second by caching the hash value in the nodes (Hashtbl_hval), which makes successful lookups faster when there are collisions (since the key comparison can be bypassed when the hash values differ) and helps a lot with unsuccessful ones.

Fasthashtbl, Hashtbl_mod and Hashtbl_hval only allow the load factors to grow to 50% before resizing the underlying array to twice its former size; the average load factor is thus 37.5%. Hashtbl, on the other hand, allows it to grow to 2.0, which, theory predicts, would require 13.4 probes per unsuccessful search, and 4.6 per successful one (Knuth, TAOCP Vol 3, Section 6.4, page 524). For a 37.5% load factor, Hashtbl, Hashtbl_mod and Hashtbl_hval will require 1.1 and 1.2, and Fasthashtbl (which uses open addressing with double hashing) will need 1.6 and 1.25 --- double hashing is about as good for successful lookups but worse at unsuccessful ones.

Trie_map_mod and Ternary are very similar: the only differences are that the latter doesn't have a separate Leaf node type, and its code has undergone a few manual optimizations explained below.

Some observations

Functional vs. imperative

Hash tables

Functional structures (maps and TSTs)


Comments

  1. Testing...

    \[C'_{C}=1+\frac{e^{2\alpha}-1-2\alpha}{4} \]
    mfp, 01 July 2009 at 17:27#
  2. $$C_{C}=1+\frac{e^{2\alpha}-1-2\alpha}{8\alpha} + \frac{\alpha}{4}$$; as for double hashing, $$C'_{D}=\frac{1}{1-\alpha}$$ and $$C_{D}=-\frac{\ln (1-\alpha)}{\alpha}$$.

    mfp, 01 July 2009 at 17:32#