eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Recursive data structures, #hash and #eql?

In ruby-talk:167185, I showed a way to use hashes as keys in hashes without degrading the efficiency of the hash (too much). It involved creating objects on the fly with the appropriate #eql? and #hash methods, as we often do with hashes.

Some time later, I realized it would fail horribly (SystemStackError) if the hash contained itself (or one of its elements did, transitively). And then I asked myself, "am I really doing it that much worse than Ruby?". We know that the semantics of Hash#{eql?,hash} are such that "recursive hashes" are fine, but what about their cousin, Array?

 [RUBY_VERSION, RUBY_RELEASE_DATE]  # => ["1.8.3", "2005-09-24"]
 W = lambda{|x|x<<x}
 W[[]].hash                         # => 
 # ~> -:3:in `hash': stack level too deep (SystemStackError)
 # ~> 	from -:3

"Mmmm surely 1.9 will know better":

 [RUBY_VERSION, RUBY_RELEASE_DATE]  # => ["1.9.0", "2005-11-03"]
 W = lambda{|x|x<<x}
 W[[]].hash                         # => 2
 W[[]].eql? W[[]]                   # => 
 # ~> -:4:in `eql?': stack level too deep (SystemStackError)
 # ~> 	from -:4

Alright, so Hash#hash got wiser, but eql? is still doing a recursive element-by-element comparison.

Just to make sure, let's verify that Hash#{eql?,hash} can handle recursion (by not handling it at all, that is...)

 W2 = lambda{|x| x[x] = x}
 W2[{}].hash                        # => -605020064
 W2[{}].hash                        # => -605020114
 W2[{}].eql? W2[{}]                 # => false
 W2[{}] == W2[{}]                   # => false
 {} == {}                           # => true
 {1=>{}} == {1=>{}}                 # => true
 { {}=>1 } == { {}=>1 }             # => false

The two last lines differ because

 {} == {}                           # => true
 {}.eql?({})                        # => false
 

We finally arrive to our small challenge:

Can we create an Array#eql? method able to withstand recursive data structures consisting of arrays?

The default definition (rb_ary_eql) is essentially equivalent to:

 class Array
   def eql?(o) # !> method redefined; discarding old eql?
     return true if o.object_id == object_id
     return false unless Array === o
     return false if o.size != size
     
     size.times{|i| return false unless self[i].eql? o[i] }
     true
   end
 end

Let's verify that is behaves as badly as the default one. The handy lambda we've been using is called W because it's reminiscent of the Omega combinator:

 W = lambda{|x|x<<x}
 W[[]].eql? W[[]]                   # => 
                           
 # ~> -:3:in `eql?': stack level too deep (SystemStackError)
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	 ... 4477 levels...
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:7:in `eql?'
 # ~> 	from -:16

Great, this shows we have something to play with.


A solution

Something to look at

Notation

We can create fairly complex recursive structures using W, introduced above. Can you picture the following one?

 W[ W[ W[[1]] + W[[2]] + [3] ] + [4] ]

Maybe not; let's try with a simpler one, like

 W[[] + W[[]]]

Array#inspect represents it as

 W[[1] + W[[]]]                        # => [1, [[...]], [...]]

which is not very telling. I'll use the following notation:

 W[[1] + W[[]]]          ---->         &1[1, &2[&2], &1]

were &<NUMBER>[...] means that the array has got ID <NUMBER>, and &<NUMBER> is a reference to a previously defined array. Therefore, an array that only contains itself is &1[&1]. If we push another empty array, it becomes &1[&1, &2[]].

You might want to try to implement this before reading the next section; hacking should be more fun than just reading :)

Implementation

Let's implement a method returning the representation according to the above notation.

 
class Array
  def rec_display(counter = [0], mapping = {})
    counter[0] += 1
    mapping[object_id] = counter[0]
    "&#{counter}[" + map do |x|
      if Array === x
        (i = mapping[x.object_id] and "&#{i}") || x.rec_display(counter, mapping)
      else
        x.inspect
      end
    end.join(", ") + "]"
  end
end

[1,2,3].rec_display                    # => "&1[1, 2, 3]"
a = [1,2,3]
[a, a]                                 # => [[1, 2, 3], [1, 2, 3]]
[a, a].rec_display                     # => "&1[&2[1, 2, 3], &2]"
 
W = lambda{|x|x<<x}  
W[[]].rec_display                      # => "&1[&1]"
W[W[[]]].rec_display                   # => "&1[&1, &1]"
W[W[[]] + W[[]]].rec_display           # => "&1[&2[&2], &3[&3], &1]"
a = W[ W[ W[[1]] + W[[2]] + [3] ] + [4] ]
a.rec_display                          # => "&1[1, &2[1, &2], 2, &3[2, &3], 3, &4[1, &2, 2, &3, 3, &4], 4, &1]"
W[[1] + W[[]]].rec_display             # => "&1[1, &2[&2], &1]"

Why do we need counter[0]; doesn't just counter suffice? Assume we didn't and consider the following case:

[[1,[]], []]

The array has got two elements, [1,[]] and []. During the top-level call to #map, [1, []] is tagged (in mapping) with counter=2, and the second element would normally be tagged with counter=3 (counter = 1 is reserved to the outer array). But [1, []], being itself an array, will also be represented with rec_display: we get a recursive call to rec_display with the values of counter and mapping at that point in time, i.e.

 x.rec_display(2, somemapping)   

where x = [1, []] and somemapping has the following associations:

 [[1,[]], []].object_id => 1
 [1,[]].object_id       => 2

The second element of the outer array ([]) hasn't been processed yet, so it's not yet in the mapping.

Now, let's see what happens inside the rec_display(2, somemapping) call. self is [1,[]], so the second element will also be recorded in the mapping, with the association

   [].object_id         => 3

When we're back from the recursive call, it's about time to add the mapping for the last array, [], the second element of 1, [, []]. This is when we see the need for reference semantics for counter: if we passed it by value when calling rec_display(2, somemapping), at this point counter would still be 2, so the last element would be tagged with ID=3. We'd obtain

 &1[&2[1,&3[]], &3]

which differs from the correct one

 &1[&2[1,&3[]], &4[]]

An implementation of the recursion-safe Array#eql?

We can define Array#eql? using the same technique we employed to display a recursive array. a.eql?(b) should hold whenever a.rec_display == b.rec_display, but we cannot use that as a definition because it would only work if all the elements inside the array can be meaningfully represented with #inspect.

Here's the code, without further explanations:

class Array
  def eql?(o, counter = [0], my_rec = {}, his_rec = {}) # !> method redefined; discarding old eql?
    return true if o.object_id == object_id
    return false unless Array === o
    return false if o.size != size
    
    counter[0] += 1
    my_rec[object_id] = his_rec[o.object_id] = counter[0]
    
    size.times do |i|
      x, y = self[i], o[i]
      xid, yid = x.object_id, y.object_id
      if Array === x and Array === y
        recursive = my_rec[xid] || his_rec[yid]
        if recursive && my_rec[xid] != his_rec[yid]
          return false
        elsif recursive # match
          next
        else # not recursive, just compare normally
          return false unless x.eql?(y, counter, my_rec, his_rec)
        end
      elsif Array === x
        # the following eql? call would return false since y isn't an array:
        #return false unless x.eql?(y, counter, my_rec, his_rec)
        return false
      else
        # general case
        return false unless x.eql? y
      end
    end
    
    true
  end
end

Let's see if it works the way it should... We'll try some evil omega expressions and see if our eql? is up to the task.

W[[]].eql? W[[]]                   # => true
a, b = [1,2].map{W[W[[]] + W[[]]]}
a                                  # => [[[...]], [[...]], [...]]
a.eql? b                           # => true
a.rec_display                      # => "&1[&2[&2], &3[&3], &1]"
b.rec_display                      # => "&1[&2[&2], &3[&3], &1]"

Let's try something else...

c = []; c << c; d = (c + c); d << d
a                                  # => [[[...]], [[...]], [...]]
d                                  # => [[[...]], [[...]], [...]]

This is where the default Array#inspect is of very little help; these arrays look identical. However

a.eql? d                           # => false
a.rec_display                      # => "&1[&2[&2], &3[&3], &1]"
d.rec_display                      # => "&1[&2[&2], &2, &1]"

We can see that this was not enough to deceive our little method.


Last modified:2005/11/25 02:21:14
Keyword(s):[blog] [ruby] [hash] [eql] [recursive]
References:[Pattern matching over Ruby objects] [Ruby] [Structural equivalence of Ruby objects]