Bounded space memoization, another new twist
Everybody knows memoization by now; there's the old Hash trick and a "memoize" module in RAA. But as old as it is, you can always add a new twist. Like making it bounded-space.
You've most probably seen (or coded) "memoize" several times. Most implementations use a hash to store the results that will grow very quickly if lots of different argument values are given to the corresponding method. A bounded-space "memoize", on the other hand, will not.
Classic
For the sake of comparison, here's a plain "memoize". It differs from the one in the RAA in that
- it works at the Module level
- the memoized method can accept a block
- it doesn't use define_method, but rather an old-fashioned module_eval (for speed and block support in Ruby < 1.9)
class Module def memoize(method) # alias_method is faster than define_method + old.bind(self).call alias_method "__memoized__#{method}", method module_eval <<-EOF def #{method}(*a, &b) # assumes the block won't change the result if the args are the same (@__memoized_#{method}_cache ||= {})[a] ||= __memoized__#{method}(*a, &b) end EOF end end
Bounded-space implementation
The bounded-space memoize is built on top of a BoundedSpaceCache that uses a clever heuristic to decide which value is to be kept in the cache.
This graph shows the execution time of fib(100) for different cache sizes:

As you can see, significant speed increases can be obtained using caches with as few as 30 or 40 entries*1: an unmemoized fib(30) takes over 2 seconds...
Here's the code:
class BoundedSpaceCache def initialize(size) @size = size clear end def get(*a, &b) idx = a.hash % @keys.size k = @keys[idx] if k != :dummy && k.eql?(a) @counts[idx] += 1 ret = @vals[idx] else ret = b.call c = @counts[idx] if c == 0 # first time or invalidated @keys[idx], @vals[idx], @counts[idx] = a, ret, 1 else @counts[idx] -= 1 end end ret end def clear @keys = Array.new(@size, :dummy) @vals = @keys.dup @counts = Array.new(@size, 0) end def resize(newsize) @size = newsize clear end end class Module def memoize(method, cache_size = 1000) alias_method "__memoized__#{method}", method module_eval <<-EOF def #{method}(*a, &b) @__memoized_#{method}_cache ||= BoundedSpaceCache.new(#{cache_size}) @__memoized_#{method}_cache.get(a){ __memoized__#{method}(*a, &b) } end EOF end end
(adapted from Alain Frisch)
Here's some code to benchmark the new "memoize":
def fib(n) if n < 2 n else fib(n-1) + fib(n-2) end end require 'benchmark' #puts "Normal: %f" % Benchmark.measure{ p fib(30) }.real class << self; memoize :fib end ITER = 40 fib(1) # creates the internal cache object cache = nil # grab the cache obj ObjectSpace.each_object(BoundedSpaceCache){|cache| } 11.step(200, 2) do |cache_size| # change bounded-space cache size cache.resize(cache_size) total = 0 ITER.times do total += Benchmark.measure{ fib(100) }.total cache.clear end puts "%d %f" % [cache_size, total / ITER] end
Interesting heuristic - twifkak (2007-02-25 (Sun) 22:05:25)
Why use that over MRU?
mfp 2007-02-26 (Mon) 09:15:28
(assumed you meant LRU) That heuristic will keep a value when it is used in over 50% of the calls (since it was recorded). It favors frequency over recency, and will perform better than LRU when
- there's a "dominant argument" and
- you have occasional successions of calls with different arguments (enough to displace the main value under the LRU policy, but <50% of the total)
In particular, it will work better than a LRU that keeps only one value (per hash key), which would be the exact equivalent ;-)
mfp 2007-02-26 (Mon) 12:39:46
(btw. fib = very favorable to LRU, since you only need the 2 last values...)
*1 needless to say, I flushed the cache before each method call
Keyword(s):[blog] [ruby] [frontpage] [memoize] [hash] [bounded] [space]
References: