Recursive data structures, #hash and #eql?
In ruby-talk:167185, I showed a way to use hashes as keys in hashes without degrading the efficiency of the hash (too much). It involved creating objects on the fly with the appropriate #eql? and #hash methods, as we often do with hashes.
Some time later, I realized it would fail horribly (SystemStackError) if the hash contained itself (or one of its elements did, transitively). And then I asked myself, "am I really doing it that much worse than Ruby?". We know that the semantics of Hash#{eql?,hash} are such that "recursive hashes" are fine, but what about their cousin, Array?
[RUBY_VERSION, RUBY_RELEASE_DATE] # => ["1.8.3", "2005-09-24"]
W = lambda{|x|x<<x}
W[[]].hash # =>
# ~> -:3:in `hash': stack level too deep (SystemStackError)
# ~> from -:3
"Mmmm surely 1.9 will know better":
[RUBY_VERSION, RUBY_RELEASE_DATE] # => ["1.9.0", "2005-11-03"]
W = lambda{|x|x<<x}
W[[]].hash # => 2
W[[]].eql? W[[]] # =>
# ~> -:4:in `eql?': stack level too deep (SystemStackError)
# ~> from -:4
Alright, so Hash#hash got wiser, but eql? is still doing a recursive element-by-element comparison.
Just to make sure, let's verify that Hash#{eql?,hash} can handle recursion (by not handling it at all, that is...)
W2 = lambda{|x| x[x] = x}
W2[{}].hash # => -605020064
W2[{}].hash # => -605020114
W2[{}].eql? W2[{}] # => false
W2[{}] == W2[{}] # => false
{} == {} # => true
{1=>{}} == {1=>{}} # => true
{ {}=>1 } == { {}=>1 } # => false
The two last lines differ because
{} == {} # => true
{}.eql?({}) # => false
We finally arrive to our small challenge:
Can we create an Array#eql? method able to withstand recursive data structures consisting of arrays?
The default definition (rb_ary_eql) is essentially equivalent to:
class Array
def eql?(o) # !> method redefined; discarding old eql?
return true if o.object_id == object_id
return false unless Array === o
return false if o.size != size
size.times{|i| return false unless self[i].eql? o[i] }
true
end
end
Let's verify that is behaves as badly as the default one. The handy lambda we've been using is called W because it's reminiscent of the Omega combinator:
W = lambda{|x|x<<x}
W[[]].eql? W[[]] # =>
# ~> -:3:in `eql?': stack level too deep (SystemStackError)
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> ... 4477 levels...
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:7:in `eql?'
# ~> from -:16
Great, this shows we have something to play with.
A solution
Something to look at
Notation
We can create fairly complex recursive structures using W, introduced above. Can you picture the following one?
W[ W[ W[[1]] + W[[2]] + [3] ] + [4] ]
Maybe not; let's try with a simpler one, like
W[[] + W[[]]]
Array#inspect represents it as
W[[1] + W[[]]] # => [1, [[...]], [...]]
which is not very telling. I'll use the following notation:
W[[1] + W[[]]] ----> &1[1, &2[&2], &1]
were &<NUMBER>[...] means that the array has got ID <NUMBER>, and &<NUMBER> is a reference to a previously defined array. Therefore, an array that only contains itself is &1[&1]. If we push another empty array, it becomes &1[&1, &2[]].
You might want to try to implement this before reading the next section; hacking should be more fun than just reading :)
Implementation
Let's implement a method returning the representation according to the above notation.
class Array
def rec_display(counter = [0], mapping = {})
counter[0] += 1
mapping[object_id] = counter[0]
"&#{counter}[" + map do |x|
if Array === x
(i = mapping[x.object_id] and "&#{i}") || x.rec_display(counter, mapping)
else
x.inspect
end
end.join(", ") + "]"
end
end
[1,2,3].rec_display # => "&1[1, 2, 3]"
a = [1,2,3]
[a, a] # => [[1, 2, 3], [1, 2, 3]]
[a, a].rec_display # => "&1[&2[1, 2, 3], &2]"
W = lambda{|x|x<<x}
W[[]].rec_display # => "&1[&1]"
W[W[[]]].rec_display # => "&1[&1, &1]"
W[W[[]] + W[[]]].rec_display # => "&1[&2[&2], &3[&3], &1]"
a = W[ W[ W[[1]] + W[[2]] + [3] ] + [4] ]
a.rec_display # => "&1[1, &2[1, &2], 2, &3[2, &3], 3, &4[1, &2, 2, &3, 3, &4], 4, &1]"
W[[1] + W[[]]].rec_display # => "&1[1, &2[&2], &1]"
Why do we need counter[0]; doesn't just counter suffice? Assume we didn't and consider the following case:
[[1,[]], []]
The array has got two elements, [1,[]] and []. During the top-level call to #map, [1, []] is tagged (in mapping) with counter=2, and the second element would normally be tagged with counter=3 (counter = 1 is reserved to the outer array). But [1, []], being itself an array, will also be represented with rec_display: we get a recursive call to rec_display with the values of counter and mapping at that point in time, i.e.
x.rec_display(2, somemapping)
where x = [1, []] and somemapping has the following associations:
[[1,[]], []].object_id => 1 [1,[]].object_id => 2
The second element of the outer array ([]) hasn't been processed yet, so it's not yet in the mapping.
Now, let's see what happens inside the rec_display(2, somemapping) call. self is [1,[]], so the second element will also be recorded in the mapping, with the association
[].object_id => 3
When we're back from the recursive call, it's about time to add the mapping for the last array, [], the second element of 1, [, []]. This is when we see the need for reference semantics for counter: if we passed it by value when calling rec_display(2, somemapping), at this point counter would still be 2, so the last element would be tagged with ID=3. We'd obtain
&1[&2[1,&3[]], &3]
which differs from the correct one
&1[&2[1,&3[]], &4[]]
An implementation of the recursion-safe Array#eql?
We can define Array#eql? using the same technique we employed to display a recursive array. a.eql?(b) should hold whenever a.rec_display == b.rec_display, but we cannot use that as a definition because it would only work if all the elements inside the array can be meaningfully represented with #inspect.
Here's the code, without further explanations:
class Array
def eql?(o, counter = [0], my_rec = {}, his_rec = {}) # !> method redefined; discarding old eql?
return true if o.object_id == object_id
return false unless Array === o
return false if o.size != size
counter[0] += 1
my_rec[object_id] = his_rec[o.object_id] = counter[0]
size.times do |i|
x, y = self[i], o[i]
xid, yid = x.object_id, y.object_id
if Array === x and Array === y
recursive = my_rec[xid] || his_rec[yid]
if recursive && my_rec[xid] != his_rec[yid]
return false
elsif recursive # match
next
else # not recursive, just compare normally
return false unless x.eql?(y, counter, my_rec, his_rec)
end
elsif Array === x
# the following eql? call would return false since y isn't an array:
#return false unless x.eql?(y, counter, my_rec, his_rec)
return false
else
# general case
return false unless x.eql? y
end
end
true
end
end
Let's see if it works the way it should... We'll try some evil omega expressions and see if our eql? is up to the task.
W[[]].eql? W[[]] # => true
a, b = [1,2].map{W[W[[]] + W[[]]]}
a # => [[[...]], [[...]], [...]]
a.eql? b # => true
a.rec_display # => "&1[&2[&2], &3[&3], &1]"
b.rec_display # => "&1[&2[&2], &3[&3], &1]"
Let's try something else...
c = []; c << c; d = (c + c); d << d a # => [[[...]], [[...]], [...]] d # => [[[...]], [[...]], [...]]
This is where the default Array#inspect is of very little help; these arrays look identical. However
a.eql? d # => false a.rec_display # => "&1[&2[&2], &3[&3], &1]" d.rec_display # => "&1[&2[&2], &2, &1]"
We can see that this was not enough to deceive our little method.
- 70 http://www.artima.com/forums/flat.jsp?forum=123&thread=138069
- 19 http://chneukirchen.org/anarchaia
- 19 http://www.artima.com/buzz/community.jsp?forum=123
- 7 http://anarchaia.org/archive/2005/11.html
- 5 http://www.anarchaia.org
- 4 http://anarchaia.org
- 3 http://chneukirchen.org/anarchaia/archive/2005/11/25.html
- 2 http://www.artima.com/forums/flat.jsp?forum=123&thread=138373
- 2 http://search.yahoo.com/search?p=display recursive array&ei=UTF-8&fr=yfp-t-501&x=wrt
- 1 http://search.yahoo.com/search?p=EQL&fr=yfp-t-501&toggle=1&cop=mss&ei=UTF-8
Keyword(s):[blog] [ruby] [hash] [eql] [recursive]
References:[Pattern matching over Ruby objects] [Ruby] [Structural equivalence of Ruby objects]