Matías Giovannini has been implementing purely functional data structures based on Bentley and Sedgewick's ternary search trees (TST) and pitching them against OCaml's imperative hash table (Hashtbl). In the interval between the initial results and his showing his first version (called Trie_map from now on), I made a functional TST ("Ternary") which compares favorably to both Trie_map and the revised implementation ("Trie_map_mod").
I also resuscitated a hash table implementation of mine that uses open addressing with double hashing (in contrast to Hashtbl's external chaining) and exhibits vastly improved performance.
I'll show the benchmark results first, to be followed by some code and analysis.
Benchmarks
I benchmark the following finite map implementations:
imperative:
Fasthashtbl: hash table with open addressing and double hashing
Hashtbl: the hash table from INRIA's stdlib (external chaining)
Hashtbl_mod: Hashtbl with more aggressive resizing (lower load factor)
Hashtbl_hval: Hashtbl_mod with caching of the hash value
functional:
Trie_map: TST with coalesced constructor for nodes with and without values
Trie_map_mod: TST with different constructors for leaves and inner nodes with/without a value
Ternary: TST with separate constructor for nodes with and without values but no leaf constructor
Map: the Map implementation from INRIA's stdlib (AVL tree)
All the finite maps are compared by
building a map with 98568 English words, which are added in (a) randomized and (b) lexicographic order
measuring the minimal lookup cost with constant lookups (so that everything is in the L1 cache)
timing successful lookups by looking for the 98568 words
in the same (random) order the elements were added
in randomized order
in lexicographic order
in reverse lexicographic order
... against the maps built in randomized and lexicographic order
timing searches against the previously generated maps (randomized and lexicographic order) using
a larger set of English words (217625, 45.3% hit rate)
a set of Spanish words (86016 words, 1.3% hit rate)
... in randomized and lexicographic order
For the sake of exhaustiveness, I measure the iteration and functor overhead and detract it from the lookup timings. The total size of the structures (including the size of the strings) is also measured. All the measurements are repeated 20 times, and the best time is retained. The benchmarks are performed in a single program execution to achieve steady state (major heap resized to required size). The set of words used in the map is large enough to ensure the structure (taking 5.7MB to over 12MB) doesn't fit in the 2MB L2 cache.
The full results can be found here (also includes hash table benchmark with integer keys).
This table shows the results for a map built in randomized order (refer to the full output for those in lexicographic order). Speed is measured in ops/sec, and size in bytes ("Find (hit, orig)" means that the lookups are performed in the same random order as the insertions).
Insertion (randomized order) and lookup
| Data structure | Size | Load factor | Insertion (rand) | Find (constant) | Find (hit, rand) | Find (hit, orig) | Find (hit, sorted) | Find (45.3%, rand) | Find (1.3%, rand) | Find (1.3% sorted) |
|---|---|---|---|---|---|---|---|---|---|---|
| FastHashtbl | 7325544 | 0.376 | 1509558 | 16701222 | 2669067 | 3554728 | 3958575 | 2347248 | 3243197 | 5133702 |
| Hashtbl | 5752664 | 1105131 | 5617227 | 2136562 | 2296294 | 2791035 | 1623434 | 1731465 | 2229713 | |
| Hashtbl_mod | 7325528 | 0.376 | 639994 | 7373058 | 2711193 | 2963744 | 3893034 | 2249186 | 2926475 | 4403349 |
| Hashtbl_hval | 8114072 | 0.376 | 675892 | 10457458 | 2769387 | 3056772 | 4153404 | 2435018 | 3436621 | 5542555 |
| Ternary | 11894440 | 179599 | 14739138 | 1484213 | 1556592 | 4576735 | 1567019 | 2419841 | 6466320 | |
| Trie_map | 99834 | 11870073 | 985476 | 1024257 | 3022102 | 1085412 | 1797166 | 5763597 | ||
| Trie_map_mod | 12417768 | 164725 | 8476101 | 1284442 | 1340733 | 3246878 | 1417483 | 2151898 | 4811496 | |
| Map | 6805440 | 167046 | 1168963 | 674209 | 671368 | 1026645 | 556788 | 633682 | 986727 |
Implementation notes
The source code for all the containers and the benchmarks can be obtained here.
I modified Hashtbl first by changing its resizing policy so that the load factor is the same as in Fasthashtbl (Hashtbl_mod), and second by caching the hash value in the nodes (Hashtbl_hval), which makes successful lookups faster when there are collisions (since the key comparison can be bypassed when the hash values differ) and helps a lot with unsuccessful ones.
Fasthashtbl, Hashtbl_mod and Hashtbl_hval only allow the load factors to grow to 50% before resizing the underlying array to twice its former size; the average load factor is thus 37.5%. Hashtbl, on the other hand, allows it to grow to 2.0, which, theory predicts, would require 13.4 probes per unsuccessful search, and 4.6 per successful one (Knuth, TAOCP Vol 3, Section 6.4, page 524). For a 37.5% load factor, Hashtbl, Hashtbl_mod and Hashtbl_hval will require 1.1 and 1.2, and Fasthashtbl (which uses open addressing with double hashing) will need 1.6 and 1.25 --- double hashing is about as good for successful lookups but worse at unsuccessful ones.
Trie_map_mod and Ternary are very similar: the only differences are that the latter doesn't have a separate Leaf node type, and its code has undergone a few manual optimizations explained below.
Some observations
Functional vs. imperative
Insertion is much costlier in the functional implementations (4 to 15 times slower) for random insertion orders. (It's much faster with lexicographic order (nearly 400%), but searches become twice slower then, as the trees are too unbalanced (the left child is always empty)).
TSTs are particularly competitive when performing lookups in (reverse, not shown here) lexicographic order (so the part of the tree being walked is in L1).
TSTs are good at unsuccesful lookups (because there's no need to examine the whole key to hash/compare it). Since the keys are quite short here, they don't have too large an advantage against more efficient hash tables like Fasthashtbl, though.
Map doesn't take much more than Hashtbl and uses less space than hash tables with low load factors: Map needs 6 words per element, and Hashtbl 4 words per item, plus one for the bucket list array (= at least 4.5 words/element, for load factor max = 2).
Hash tables
Interestingly, for all the hash tables, lookups in lexicographic order are faster. This is because they all use the same, very simple, hash function (
hash := hash * beta + char), so strings of the same length which only differ in the last few chars hash to similar values; the probe for the next lookup is therefore likely to be in L1.Fasthashtbl is faster than Hashtbl and Hashtbl_mod at insertions because it caches the hash value (so there's no need to compute it again when resizing). It beats Hashtbl_hval because it allocates less: each entry takes 5 words for the latter, 4 for the latter.
In the Hashtbl variants, new values are prepended to the bucket list, which explains why Fasthashtbl is relatively much better when performing the lookups in the same order as the insertion: the first key to be inserted takes the "direct" position indicated by the hash value, and those that collide are placed in the next ones according to the open addressing scheme. The first lookup is thus faster. In successive colliding searches, the first entries are more likely to be in cache, too (this also holds for external chaining).
Both Fasthashtbl and Hashtbl_hval perform faster unsuccessful lookups than Hashtbl & Hashtbl_mod because they can compare the hash value before the keys.
Functional structures (maps and TSTs)
Map takes by far the least space, but is slower at everything but insertion than the TSTs
Ternary is faster than Trie_map_mod because I have done some manual lambda lifting in order to avoid closure allocations and the extra constructor for leaves used by Trie_map_mod makes the inner loop slower at little gain in space. (In fact, I'd rejected a variant of Ternary with the extra constructor since it proved slower before Trie_map_mod was published.)
Comments
Testing...
\[C'_{C}=1+\frac{e^{2\alpha}-1-2\alpha}{4} \]$$C_{C}=1+\frac{e^{2\alpha}-1-2\alpha}{8\alpha} + \frac{\alpha}{4}$$; as for double hashing, $$C'_{D}=\frac{1}{1-\alpha}$$ and $$C_{D}=-\frac{\ln (1-\alpha)}{\alpha}$$.