WeakHash, WeakRefs and lambda evilness
James Edward Gray II asked in ruby-talk:176543 for
a Hash-like structure, using WeakRef, so that the key value pairs can be garbage collected as needed. I only need to support [] () and []=(), not the whole range of Hash functions. If the pair has been collected, I just need a simple nil returned.
It was 1am so instead of writing a full solution, I just pointed out some closure evilness in his WeakCache. He had written a WeakCache where only values were "weak" (i.e. could be reclaimed by the GC at any time), and was using a finalizer to remove the corresponding key from the @cache hash:
def []=( key, value ) ObjectSpace.define_finalizer(value, lambda { @cache.delete(key) })
That closure captures the whole environment, so the value would never be GCed: the finalizer itself would point to it!
In the derived WeakCache I posted, I forgot to consider that several keys could refer to the same value, as reminded by Jeffrey Schwab in ruby-talk:176720. So the code would now look like (ruby-talk:176778)
# Exercise left to the reader: add thread-safety class WeakCache attr_reader :cache def initialize( cache = Hash.new ) @cache = cache @rev_cache = Hash.new{|h,k| h[k] = {}} @reclaim_lambda = lambda do |value_id| if @rev_cache.has_key? value_id @rev_cache[value_id].each_key{|key| @cache.delete key} @rev_cache.delete value_id end end end def []( key ) value_id = @cache[key] return ObjectSpace._id2ref(value_id) unless value_id.nil? nil end def []=( key, value ) @rev_cache[value.object_id][key] = true @cache[key] = value.object_id ObjectSpace.define_finalizer(value, @reclaim_lambda) end end
Thinking about the semantics and improvements towards a better WeakHash
In that WeakCache, only values are weak: keys will not perish spontaneously. Robert Klemme presented the following code, where both the keys and the values could be GCed:
RKWeakHash = DelegateClass(Hash) class RKWeakHash def []=(key,val) # !> method redefined; discarding old []= __getobj__[WeakRef.new(key)] = WeakRef.new(val) end def [](key) # !> method redefined; discarding old [] __getobj__[WeakRef.new(key)] end def each(&b) # !> method redefined; discarding old each __getobj__.each do |k,v| b[k.__getobj__, v.__getobj__] unless k.__getobj__.nil? end self end def cleanup delete_if {|k,v| k.__getobj__.nil?} end end
I'm normally wary of things like DelegateClass, Singleton, WeakRef, Tempfile... because it's so easy to run into horrible performance*1 issues. I anticipated that Robert Klemme's solution would be very slow, and wrote a SimpleWeakHash with relaxed semantics:
- the key->value associations can disappear at any time
- all existent associations are wiped out when the GC kicks in
In particular, it removes the tacit requirement that an association remain alive as long as there is an external reference (i.e. outside the hash) to either the key or the value. The two advantages we get in exchange are:
- increased speed (we'll see how much later)
- no special code to keep SimpleWeakHash invariants: we can expose all the methods from Hash, unlike WeakCache
class SimpleWeakHash def initialize set_internal_hash end def [](key) __get_hash__[key] end def []=(key, value) __get_hash__[key] = value end def method_missing(meth, *args, &block) __get_hash__.send(meth, *args, &block) end private def __get_hash__ old_critical = Thread.critical Thread.critical = true @valid or set_internal_hash return ObjectSpace._id2ref(@hash_id) ensure Thread.critical = old_critical end def set_internal_hash hash = {} @hash_id = hash.object_id @valid = true ObjectSpace.define_finalizer(hash, lambda{@valid = false}) hash = nil end end
Note that the #[] and #[]= are redundant, but anyway they're so short...
Implementing WeakHash
I finally decided to implement a WeakHash where an association is removed as soon as either the key or the value are GCed, and if possible faster than RKWeakHash. Several ways to do it come to mind; I chose to clone the key and use that copy in an internal hash referring to the object_id of the corresponding value.
class WeakHash attr_reader :cache def initialize( cache = Hash.new ) @cache = cache @key_map = {} @rev_cache = Hash.new{|h,k| h[k] = {}} @reclaim_value = lambda do |value_id| if @rev_cache.has_key? value_id @rev_cache[value_id].each_key{|key| @cache.delete key} @rev_cache.delete value_id end end @reclaim_key = lambda do |key_id| if @key_map.has_key? key_id @cache.delete @key_map[key_id] end end end def []( key ) value_id = @cache[key] return ObjectSpace._id2ref(value_id) unless value_id.nil? nil end def []=( key, value ) case key when Fixnum, Symbol, true, false key2 = key else key2 = key.dup end @rev_cache[value.object_id][key2] = true @cache[key2] = value.object_id @key_map[key.object_id] = key2 ObjectSpace.define_finalizer(value, @reclaim_value) ObjectSpace.define_finalizer(key, @reclaim_key) end end
Performance
I compared RKWeakHash, the fixed WeakCache, SimpleWeakHash and WeakHash:
require 'benchmark' Benchmark.bmbm(30) do |bm| wh1 = RKWeakHash.new({}) wh2 = WeakHash.new wh3 = SimpleWeakHash.new wc = WeakCache.new bm.report("RKWeakHash init"){ 20.times{|i| wh1[i.to_s] = i.to_s} } bm.report("WeakHash init"){ 20.times{|i| wh2[i.to_s] = i.to_s} } bm.report("SimpleWeakHash init"){ 20.times{|i| wh3[i.to_s] = i.to_s} } bm.report("WeakCache init"){ 20.times{|i| wc[i.to_s] = i.to_s} } vals = (0..100).inject({}){|s,x| s[x] = x; s} bm.report("RKWeakHash lookup") do wh1.update(vals) vals.each{|i| wh1[i]} end bm.report("SimpleWeakHash lookup") do wh3.update(vals) vals.each{|i| wh3[i]} end end
The results are in line with my expectations:
>> Rehearsal ----------------------------------------------------------------- >> RKWeakHash init 1.320000 0.030000 1.350000 ( 1.408259) >> WeakHash init 0.000000 0.000000 0.000000 ( 0.000425) >> SimpleWeakHash init 0.000000 0.000000 0.000000 ( 0.000155) >> WeakCache init 0.000000 0.000000 0.000000 ( 0.000242) >> RKWeakHash lookup 1.620000 0.010000 1.630000 ( 1.762650) >> SimpleWeakHash lookup 0.000000 0.000000 0.000000 ( 0.000634) >> -------------------------------------------------------- total: 2.980000sec >> >> user system total real >> RKWeakHash init 0.830000 0.010000 0.840000 ( 0.971664) >> WeakHash init 0.000000 0.000000 0.000000 ( 0.000464) >> SimpleWeakHash init 0.000000 0.000000 0.000000 ( 0.000197) >> WeakCache init 0.000000 0.000000 0.000000 ( 0.000266) >> RKWeakHash lookup 1.720000 0.010000 1.730000 ( 1.903419) >> SimpleWeakHash lookup 0.000000 0.000000 0.000000 ( 0.000801)
If anything, I was a bit surprised by RKWeakHash's exceedingly bad performance; I knew it was going to be slow, but... One could blame the benchmarks, since they are triggering very frequent GC runs, killing the WeakRefs in the process, but anything else wouldn't be realistic.
re: Lua envy . - hgs (2006-01-25 (Wed) 11:36:29)
(There's some wierdness going on with adding my comment after your reply. If I hit return I get a network error. Also the box is only one line in Firefox 1.5)
Lua tables are a very flexible data structure, and would be nice to have in Ruby. See: http://www.lua.org/pil/11.html and the following pages. For the basics on accessing them see: http://www.lua.org/pil/2.5.html
One thing I really like about them is the ability to leave trailing commas, which you can't do in ruby:
x = [1,2,3,]
is illegal in ruby, but the reason lua allows it is for generating tables automatically. Passing a table to a function is a way of using Lua as a data language, see: http://www.lua.org/pil/5.3.html and for an example in practice: http://www.lua.org/pil/10.1.html
Lua envy. - hgs (2006-01-25 (Wed) 03:17:57)
Not the first time I've wished that Ruby would "mug" Lua and steal something from it [:-)]:
http://www.lua.org/pil/17.html
http://www.lua.org/pil/17.1.html
Before it was %bxy -- see the bottom of
http://www.lua.org/pil/20.2.html
mfp 2006-01-25 (Wed) 04:39:25
%bxy ++ ! 鬼車 can also do it with named backreferences but it isn't as convenient. I see the above WeakHash implementation works like Lua's tables in "kv" mode; WeakCache would be "v". Do you know of anything else worth being stolen (instead of just reinvented, stealing is faster:) ? (and plz don't say 'the implementation of the language itself' :-P)
*1 a bit of NIH too
- 50 http://blog.methodmissing.com/2007/5/10/rails-memory-usage-case-study
- 36 http://www.artima.com/forums/flat.jsp?forum=123&thread=145802
- 23 http://www.ruby-forum.com/topic/52595
- 21 http://www.artima.com/buzz/community.jsp?forum=123
- 9 http://chneukirchen.org/anarchaia
- 8 http://zdavatz.wordpress.com/2007/06/21/more-on-ruby-memory-leak
- 7 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/176820
- 6 http://anarchaia.org/archive/2006/01.html
- 5 http://zdavatz.wordpress.com/category/memory-leak
- 4 http://planetruby.0x42.net
Keyword(s):[blog] [ruby] [weakhash] [weakref] [ObjectSpace] [_id2ref]
References:[Ruby] [On GC and finalizers in Ruby, corrected weak hash table implementations]