eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Bounded space memoization, another new twist

Everybody knows memoization by now; there's the old Hash trick and a "memoize" module in RAA. But as old as it is, you can always add a new twist. Like making it bounded-space.

You've most probably seen (or coded) "memoize" several times. Most implementations use a hash to store the results that will grow very quickly if lots of different argument values are given to the corresponding method. A bounded-space "memoize", on the other hand, will not.

Classic

For the sake of comparison, here's a plain "memoize". It differs from the one in the RAA in that

  • it works at the Module level
  • the memoized method can accept a block
  • it doesn't use define_method, but rather an old-fashioned module_eval (for speed and block support in Ruby < 1.9)
class Module
  def memoize(method)
    # alias_method is faster than define_method + old.bind(self).call
    alias_method "__memoized__#{method}", method
    module_eval <<-EOF
      def #{method}(*a, &b)
        # assumes the block won't change the result if the args are the same
        (@__memoized_#{method}_cache ||= {})[a] ||= __memoized__#{method}(*a, &b)
      end
    EOF
  end
end

Bounded-space implementation

The bounded-space memoize is built on top of a BoundedSpaceCache that uses a clever heuristic to decide which value is to be kept in the cache.

This graph shows the execution time of fib(100) for different cache sizes: /hiki/bounded_space_memoization/memoization.png

As you can see, significant speed increases can be obtained using caches with as few as 30 or 40 entries*1: an unmemoized fib(30) takes over 2 seconds...

Here's the code:

class BoundedSpaceCache
  def initialize(size)
    @size = size
    clear
  end

  def get(*a, &b)
    idx = a.hash % @keys.size
    k = @keys[idx]
    if k != :dummy && k.eql?(a)
      @counts[idx] += 1
      ret = @vals[idx]
    else
      ret = b.call
      c = @counts[idx]
      if c == 0 # first time or invalidated
        @keys[idx], @vals[idx], @counts[idx] = a, ret, 1
      else
        @counts[idx] -= 1
      end
    end

    ret
  end

  def clear
    @keys = Array.new(@size, :dummy)
    @vals = @keys.dup
    @counts = Array.new(@size, 0)
  end
  
  def resize(newsize)
    @size = newsize
    clear
  end
end

class Module
  def memoize(method, cache_size = 1000)
    alias_method "__memoized__#{method}", method
    module_eval <<-EOF
      def #{method}(*a, &b)
        @__memoized_#{method}_cache ||= BoundedSpaceCache.new(#{cache_size})

        @__memoized_#{method}_cache.get(a){ __memoized__#{method}(*a, &b) }
      end
    EOF
  end
end

(adapted from Alain Frisch)

Here's some code to benchmark the new "memoize":

def fib(n)
   if n < 2
     n
   else
     fib(n-1) + fib(n-2)
   end
end

require 'benchmark'
#puts "Normal: %f" % Benchmark.measure{ p fib(30) }.real

class << self; memoize :fib end
ITER = 40
fib(1) # creates the internal cache object
cache = nil
# grab the cache obj
ObjectSpace.each_object(BoundedSpaceCache){|cache| }
11.step(200, 2) do |cache_size|
  # change bounded-space cache size
  cache.resize(cache_size)
  total = 0
  ITER.times do
    total += Benchmark.measure{ fib(100) }.total
    cache.clear
  end
  puts "%d %f" % [cache_size, total / ITER]
end



Interesting heuristic - twifkak (2007-02-25 (Sun) 22:05:25)

Why use that over MRU?

mfp 2007-02-26 (Mon) 09:15:28

(assumed you meant LRU) That heuristic will keep a value when it is used in over 50% of the calls (since it was recorded). It favors frequency over recency, and will perform better than LRU when

  • there's a "dominant argument" and
  • you have occasional successions of calls with different arguments (enough to displace the main value under the LRU policy, but <50% of the total)

In particular, it will work better than a LRU that keeps only one value (per hash key), which would be the exact equivalent ;-)

mfp 2007-02-26 (Mon) 12:39:46

(btw. fib = very favorable to LRU, since you only need the 2 last values...)


Last modified:2007/02/19 02:49:51
Keyword(s):[blog] [ruby] [frontpage] [memoize] [hash] [bounded] [space]
References:

*1 needless to say, I flushed the cache before each method call