eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

WeakHash, WeakRefs and lambda evilness

James Edward Gray II asked in ruby-talk:176543 for

a Hash-like structure, using WeakRef, so that the key value pairs can be garbage collected as needed. I only need to support [] () and []=(), not the whole range of Hash functions. If the pair has been collected, I just need a simple nil returned.

It was 1am so instead of writing a full solution, I just pointed out some closure evilness in his WeakCache. He had written a WeakCache where only values were "weak" (i.e. could be reclaimed by the GC at any time), and was using a finalizer to remove the corresponding key from the @cache hash:

   def []=( key, value )
      ObjectSpace.define_finalizer(value, lambda { @cache.delete(key) })

That closure captures the whole environment, so the value would never be GCed: the finalizer itself would point to it!

In the derived WeakCache I posted, I forgot to consider that several keys could refer to the same value, as reminded by Jeffrey Schwab in ruby-talk:176720. So the code would now look like (ruby-talk:176778)

# Exercise left to the reader: add thread-safety
class WeakCache
  attr_reader :cache
  def initialize( cache = Hash.new )
    @cache = cache
    @rev_cache = Hash.new{|h,k| h[k] = {}}
    @reclaim_lambda = lambda do |value_id| 
      if @rev_cache.has_key? value_id
        @rev_cache[value_id].each_key{|key| @cache.delete key}
        @rev_cache.delete value_id
      end
    end
  end

  def []( key )
    value_id = @cache[key]
    return ObjectSpace._id2ref(value_id) unless value_id.nil?
    nil
  end

  def []=( key, value )
    @rev_cache[value.object_id][key] = true
    @cache[key] = value.object_id
    ObjectSpace.define_finalizer(value, @reclaim_lambda)
  end
end

Thinking about the semantics and improvements towards a better WeakHash

In that WeakCache, only values are weak: keys will not perish spontaneously. Robert Klemme presented the following code, where both the keys and the values could be GCed:

RKWeakHash = DelegateClass(Hash) 
class RKWeakHash
  def []=(key,val) # !> method redefined; discarding old []=
    __getobj__[WeakRef.new(key)] = WeakRef.new(val)
  end
  
  def [](key) # !> method redefined; discarding old []
    __getobj__[WeakRef.new(key)]
  end

  def each(&b) # !> method redefined; discarding old each
    __getobj__.each do |k,v|
      b[k.__getobj__, v.__getobj__] unless k.__getobj__.nil?
    end
    self
  end
  
  def cleanup
    delete_if {|k,v| k.__getobj__.nil?}
  end
end

I'm normally wary of things like DelegateClass, Singleton, WeakRef, Tempfile... because it's so easy to run into horrible performance*1 issues. I anticipated that Robert Klemme's solution would be very slow, and wrote a SimpleWeakHash with relaxed semantics:

  • the key->value associations can disappear at any time
  • all existent associations are wiped out when the GC kicks in

In particular, it removes the tacit requirement that an association remain alive as long as there is an external reference (i.e. outside the hash) to either the key or the value. The two advantages we get in exchange are:

  • increased speed (we'll see how much later)
  • no special code to keep SimpleWeakHash invariants: we can expose all the methods from Hash, unlike WeakCache

class SimpleWeakHash
  def initialize
    set_internal_hash
  end
  
  def [](key)
    __get_hash__[key]
  end

  def []=(key, value)
    __get_hash__[key] = value
  end

  def method_missing(meth, *args, &block)
    __get_hash__.send(meth, *args, &block)
  end

  private
  def __get_hash__
    old_critical = Thread.critical
    Thread.critical = true
    @valid or set_internal_hash
    return ObjectSpace._id2ref(@hash_id)
  ensure
    Thread.critical = old_critical
  end

  def set_internal_hash
    hash = {}
    @hash_id = hash.object_id
    @valid = true
    ObjectSpace.define_finalizer(hash, lambda{@valid = false})
    hash = nil
  end
end

Note that the #[] and #[]= are redundant, but anyway they're so short...

Implementing WeakHash


I finally decided to implement a WeakHash where an association is removed as soon as either the key or the value are GCed, and if possible faster than RKWeakHash. Several ways to do it come to mind; I chose to clone the key and use that copy in an internal hash referring to the object_id of the corresponding value.

class WeakHash
  attr_reader :cache
  def initialize( cache = Hash.new )
    @cache = cache
    @key_map = {}
    @rev_cache = Hash.new{|h,k| h[k] = {}}
    @reclaim_value = lambda do |value_id| 
      if @rev_cache.has_key? value_id
        @rev_cache[value_id].each_key{|key| @cache.delete key}
        @rev_cache.delete value_id
      end
    end
    @reclaim_key = lambda do |key_id|
      if @key_map.has_key? key_id
        @cache.delete @key_map[key_id]
      end
    end
  end

   def []( key )
     value_id = @cache[key]
     return ObjectSpace._id2ref(value_id) unless value_id.nil?
     nil
   end

   def []=( key, value )
     case key
     when Fixnum, Symbol, true, false
       key2 = key
     else
       key2 = key.dup
     end
     @rev_cache[value.object_id][key2] = true
     @cache[key2] = value.object_id
     @key_map[key.object_id] = key2

     ObjectSpace.define_finalizer(value, @reclaim_value)
     ObjectSpace.define_finalizer(key, @reclaim_key)
   end
end

Performance

I compared RKWeakHash, the fixed WeakCache, SimpleWeakHash and WeakHash:

require 'benchmark'
Benchmark.bmbm(30) do |bm|
  wh1 = RKWeakHash.new({})
  wh2 = WeakHash.new
  wh3 = SimpleWeakHash.new
  wc = WeakCache.new
  bm.report("RKWeakHash init"){ 20.times{|i| wh1[i.to_s] = i.to_s} }
  bm.report("WeakHash init"){ 20.times{|i| wh2[i.to_s] = i.to_s} }
  bm.report("SimpleWeakHash init"){ 20.times{|i| wh3[i.to_s] = i.to_s} }
  bm.report("WeakCache init"){ 20.times{|i| wc[i.to_s] = i.to_s} }
  vals = (0..100).inject({}){|s,x| s[x] = x; s}
  bm.report("RKWeakHash lookup") do 
    wh1.update(vals)
    vals.each{|i| wh1[i]}
  end
  bm.report("SimpleWeakHash lookup") do 
    wh3.update(vals)
    vals.each{|i| wh3[i]}
  end
end

The results are in line with my expectations:

>> Rehearsal -----------------------------------------------------------------
>> RKWeakHash init                 1.320000   0.030000   1.350000 (  1.408259)
>> WeakHash init                   0.000000   0.000000   0.000000 (  0.000425)
>> SimpleWeakHash init             0.000000   0.000000   0.000000 (  0.000155)
>> WeakCache init                  0.000000   0.000000   0.000000 (  0.000242)
>> RKWeakHash lookup               1.620000   0.010000   1.630000 (  1.762650)
>> SimpleWeakHash lookup           0.000000   0.000000   0.000000 (  0.000634)
>> -------------------------------------------------------- total: 2.980000sec
>> 
>>                                     user     system      total        real
>> RKWeakHash init                 0.830000   0.010000   0.840000 (  0.971664)
>> WeakHash init                   0.000000   0.000000   0.000000 (  0.000464)
>> SimpleWeakHash init             0.000000   0.000000   0.000000 (  0.000197)
>> WeakCache init                  0.000000   0.000000   0.000000 (  0.000266)
>> RKWeakHash lookup               1.720000   0.010000   1.730000 (  1.903419)
>> SimpleWeakHash lookup           0.000000   0.000000   0.000000 (  0.000801)

If anything, I was a bit surprised by RKWeakHash's exceedingly bad performance; I knew it was going to be slow, but... One could blame the benchmarks, since they are triggering very frequent GC runs, killing the WeakRefs in the process, but anything else wouldn't be realistic.


re: Lua envy . - hgs (2006-01-25 (Wed) 11:36:29)

(There's some wierdness going on with adding my comment after your reply. If I hit return I get a network error. Also the box is only one line in Firefox 1.5)

Lua tables are a very flexible data structure, and would be nice to have in Ruby. See: http://www.lua.org/pil/11.html and the following pages. For the basics on accessing them see: http://www.lua.org/pil/2.5.html

One thing I really like about them is the ability to leave trailing commas, which you can't do in ruby:

 x = [1,2,3,]

is illegal in ruby, but the reason lua allows it is for generating tables automatically. Passing a table to a function is a way of using Lua as a data language, see: http://www.lua.org/pil/5.3.html and for an example in practice: http://www.lua.org/pil/10.1.html

Lua envy. - hgs (2006-01-25 (Wed) 03:17:57)

Not the first time I've wished that Ruby would "mug" Lua and steal something from it [:-)]:

http://www.lua.org/pil/17.html

http://www.lua.org/pil/17.1.html

Before it was %bxy -- see the bottom of

http://www.lua.org/pil/20.2.html


mfp 2006-01-25 (Wed) 04:39:25

%bxy ++ ! 鬼車 can also do it with named backreferences but it isn't as convenient. I see the above WeakHash implementation works like Lua's tables in "kv" mode; WeakCache would be "v". Do you know of anything else worth being stolen (instead of just reinvented, stealing is faster:) ? (and plz don't say 'the implementation of the language itself' :-P)

Last modified:2006/01/24 13:47:32
Keyword(s):[blog] [ruby] [weakhash] [weakref] [ObjectSpace] [_id2ref]
References:[Ruby] [On GC and finalizers in Ruby, corrected weak hash table implementations]

*1 a bit of NIH too