eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Pattern matching over Ruby objects

I've been writing about recursive structures and how to check whether two objects are similar for a while and I finally got to the code I wanted: recursion-safe "pattern matching" over Ruby objects. This begins to feel more useful, and if anything, it looks better.

Examples

We need a small class to show how this sort of pattern matching works, something as simple as this will do:

class Node
  attr_accessor :a, :b
  attr_reader :value
  def initialize(value, a,b)
    @a, @b = a, b
    @value = value
  end
end

Let's proceed to the first pattern: we will match any Node object and capture its instance variables (@value, @a and @b):

o = lambda{ Object.new }
pattern = Pattern.new do |p|
  Node.new(p.capture(:thevalue){ o[] }, 
           p.capture(:left){ o[] }, p.capture(:right){ o[] })
end

md = pattern.match Node.new("some value", nil, Node.new("something else", nil, nil))
md[:thevalue]                                      # => "some value"
md[:left]                                          # => nil
md[:right]                                         # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>
#regexpish
md[0]                                              # => #<Node:0xb7ddaeac @value="some value", @b=#<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>, @a=nil>
md[1]                                              # => "some value"
md[2]                                              # => nil
md[3]                                              # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>
a, b, c = md.captures
a                                                  # => "some value"
b                                                  # => nil
c                                                  # => #<Node:0xb7ddaec0 @value="something else", @b=nil, @a=nil>

I can specify which values must be captured with

p.capture(:capture_name){  value to be matched and captured }

References to named captures

It's been really easy so far. Time to move to something a bit more interesting; I'll add a number of constraints:

  • the value held by the node must be a String
  • no left-child (nil)
  • the right-child must be a node with no children and with the same value as the parent

The last condition can be specified with a reference to a previous capture, e.g.

 Pattern.new{|p|   [p.capture(:val){ Object.new }, Object.new, p.capture(:val)] }

matches all arrays such that the first element is the same as the last. So let's do it:

pattern = Pattern.new do |p|
  Node.new(p.capture(:value){""}, nil, Node.new(p.capture(:value), nil, nil)) 
end
# matches a Node with a String value and whose 'right-child' is a node with the
# very same value and no children


md = pattern.match Node.new("some value", nil, Node.new("some value", nil, nil))
# different values (in the object_id sense), no match
md                                                 # => nil


s = "whatever"
md = pattern.match Node.new(s, nil, Node.new(s, nil, nil))
md[:value]                                         # => "whatever"

Recursive captures

No way I could forget recursive stuff, right? Here's another pattern, matching a node whose value is an array with three elements of classes String, Fixnum and String (or derived, as usual), and referring to itself:

pattern = Pattern.new do |p|
  p.capture(:all){ Node.new([p.capture(:l){ "" }, p.capture(:middle){ 1 }, ""], nil, p.capture(:all)) }
end

n = Node.new(["foo", 2, "whatever"], nil, Node.new(nil, nil, nil))
pattern.match(n)                                   # => nil
n.b = n
md = pattern.match(n)
md[:l]                                             # => "foo"
md[:middle]                                        # => 2


The code

It's based on the code I used to test whether two objects are "structurally equivalent". By creating special Capture objects with suitable #struct_eql? methods, I can record the matches while #struct_eql? proceeds.

pattern_matching.rb

# "Pattern matching over Ruby objects"
# Copyright (C) 2005 Mauricio Fernandez <mfp@acm.org>
# Code subject to the terms of the Ruby license.
   class Object
     def struct_object_id
       object_id
     end
     
     def immediate?
       [FalseClass, TrueClass, NilClass, Fixnum, Symbol].any?{|x| x === self}
     end
  
     def immediate_class
       self.class
     end
   end
   
   class Array
     def struct_eql?(o, counter = [0], my_rec = {}, his_rec = {}, helper = {})
       return true if o.struct_object_id == struct_object_id
       return false unless self.class === o
       return false if o.size != size
       
       counter[0] += 1
       my_rec[struct_object_id] = his_rec[o.struct_object_id] = counter[0]
       
       size.times do |i|
         x, y = self[i], o[i]
         if [x,y].any?{|object| object.immediate? }
           return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper)
           next
         end
         xid, yid = x.struct_object_id, y.struct_object_id
         if Array === x and Array === y
           if my_rec[xid] || his_rec[yid]
             if my_rec[xid] != his_rec[yid]
               return false
             else # equal
               next
             end
           else
             return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper)
           end
         elsif Array === x # y is not an array...
           return false
         else
           # general case
           return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper)
         end
       end
       
       true
     end
   end
  
   class Object
     def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {})
       return true if other.struct_object_id == struct_object_id
       return false unless self.class === other
       return true if immediate?
       return false unless instance_variables.sort == other.instance_variables.sort
       
       counter[0] += 1
       my_rec[struct_object_id] = his_rec[other.struct_object_id] = counter[0]
  
       instance_variables.map do |iv|
         x, y = [self, other].map{|obj| obj.instance_variable_get(iv)}
         xid, yid = [x,y].map{|obj| obj.struct_object_id }
         xtag, ytag = my_rec[xid], his_rec[yid]
         if xtag || ytag
           if xtag == ytag
             next
           else
             return false
           end
         end
  
         counter[0] += 1
         my_rec[xid] = his_rec[yid] = counter[0]
  
         return false unless x.struct_eql?(y, counter, my_rec, his_rec, helper)
       end
       true
     end
   end
   
   # Very little checking atm., left for later
   class Hash
     def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {})
       return false unless self.class === other
       return false unless self.size == other.size
  
       true
     end
   end
  
   class Pattern
     Capture = Struct.new(:model) do 
       attr_reader :value
  
       def struct_object_id; model.object_id end
       def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {})
  
         unless [TrueClass, FalseClass].any?{|x| x == self.model.immediate_class}
           xid, yid = [self.model, other].map{|o| o.struct_object_id}
           xtag, ytag = my_rec[xid], his_rec[yid]
           if xtag || ytag
             if xtag == ytag
               @value = other
               return true
             else
               return false
             end
           end
  
           counter[0] += 1
           my_rec[model.struct_object_id] = his_rec[other.struct_object_id] = counter[0]
         end
  
         @value = other
         model.struct_eql?(other, counter, my_rec, his_rec, helper)
       end
     end
  
     class BooleanMatcher 
       @id = "BooleanMatcher_0"
       class << self; def new_id; @id = @id.succ end end
       def initialize; @id = self.class.new_id; end
       def immediate?; true end
       def immediate_class
         TrueClass
       end
       def struct_object_id; @id end
       def struct_eql?(other, counter = [0], my_rec = {}, his_rec = {}, helper = {})
         return false unless [FalseClass, TrueClass].any?{|x| x === other}
         unless helper.has_key?(struct_object_id)
           helper[struct_object_id] = other
         end
         return false if helper[struct_object_id] != other
         true
       end
     end
  
     def BOOLEAN; BooleanMatcher.new end
  
     MatchedCapture = Struct.new(:name, :index, :value)
     
     class MatchData
       attr_reader :captures, :value
  
       def initialize(value, captures)
         @name_hash = {}
         @idx_hash = {}
         @captures = captures.clone.sort_by{|c| c.index}.map{|c| c.value}
         captures.each do |capture|
           @name_hash[capture.name] = @idx_hash[capture.index] = capture
         end
         @value = value
       end
  
       def [](idx_or_name)
         return @value if idx_or_name == 0
           
         if @idx_hash.has_key? idx_or_name
           @idx_hash[idx_or_name].value
         elsif @name_hash.has_key? idx_or_name
           @name_hash[idx_or_name].value
         else
           nil
         end
       end
     end
  
     def initialize
       @captures = {}
       @pattern = yield self
     end
  
     def capture(name, &block)
       raise ArgumentError, "Repeated capture" if @captures.has_key?(name) && block_given?
       if @captures.has_key?(name)
         @captures[name].last
       else
         capture = create_capture(nil)
         @captures[name] = [@captures.size + 1, capture]
         capture.model = yield
         capture
       end
     end
  
     def match(other)
       r = @pattern.struct_eql?(other)
       if r
         matched_captures = @captures.map do |name, (idx, capture)|
           MatchedCapture.new(name, idx, capture.value)
         end
         return MatchData.new(other, matched_captures)
       else
         nil
       end
     end
  
     private
     def create_capture(model)
       Capture.new(model)
     end
   end

Related work

Reg

The first thing I was reminded of when writing the above was Caleb Clausen's Reg:

Reg is a library for pattern matching in ruby data structures. Reg provides Regexp-like match and match-and-replace for all data structures (particularly Arrays, Objects, and Hashes), not just Strings.

It is more powerful than the pattern matching I presented in the sense that it can handle regular languages: my code is missing alternation and Kleene's closure. I'd need a backtracking engine to handle them.

On the other hand, it doesn't do backreferences yet, which the above code handles just fine in a bit over 200 lines, and I don't know how it would behave with recursive data structures.

types.rb

Another somewhat related thing would be Eivind Eklund's types.rb. That was meant to be used in a very different way though:

This is an implementation of type checking for Ruby. The implementation is fairly powerful, and allows the definition of types more or less arbitrarily. There is just one tiny problem with this: I've found it to only get in the way in practice.

Using types.rb, one could specify the valid argument types for a method. The author himself found it nocive enough to warn potential users:

If you want to play with adding extra type checking to Ruby, I recommend this module. However, I do not recommend adding extra type checking (at least not based on strict checking of method parameters.)

However, I wonder if a different use of that (testing) could prove useful. Maybe the vocabulary of the sort of "pattern language"*1 I created could be extended using types.rb's assertions to create more useful patterns.

Further examples

Here are some other examples, just to clarify the semantics, especially regarding immediate values such as true/false.

 pattern = Pattern.new{|p| [true, "", true]}
 pattern.match([true, "", true]).nil?            # => false
 pattern.match([false, "", false]).nil?          # => true

 pattern = Pattern.new{|p| [1, "", 1]}
 pattern.match([1, "", 1]).nil?                  # => false
 pattern.match([2, "", 2]).nil?                  # => false

 pattern = Pattern.new{|p| [p.capture(:bool){p.BOOLEAN}, "", true, p.capture(:bool)] }
 pattern.match([false, "foo", true, false]).nil?     # => false
 pattern.match([false, "foo", false, false]).nil?    # => true
 pattern.match([false, "foo", true, true]).nil?      # => true
 pattern.match([true, "foo", true, true]).nil?       # => false
 pattern.match([false, "foo", true]).nil?            # => true

 
 K1 = Class.new do
   def initialize(a,b); @a, @b = a, b end
 end
 pattern = Pattern.new do |p|
   p.capture(:all){ K1.new(p.capture(:internal){Object.new}, K1.new(p.capture(:internal), :b)) }
 end

 md = pattern.match K1.new(:a, K1.new(:a,:b))
 md[:internal]                                     # => :a
 md[:all]                                          # => #<K1:0xb7dd621c @b=#<K1:0xb7dd6244 @b=:b, @a=:a>, @a=:a>

 md = pattern.match K1.new(:a, K1.new(:b, :a))
 md.nil?                                           # => true
 
 pattern = Pattern.new do |p|
   p.capture(:all){ K1.new(1, K1.new(K1.new(nil, 1), :b)) }
 end
 pattern.match( K1.new(1, K1.new(K1.new(nil, 2), :a)) ).nil?    # => true
 pattern.match( K1.new(1, K1.new(K1.new(nil, 1), :a)) ).nil?    # => false
 
 Pattern.new{ [1,2,3,1] }.match [3, 5, 6]          # => nil
 md = Pattern.new do |p|
   p.capture(:all){ [1,2,p.capture(:inside){[1,2]} ] }
 end.match [2,3,[2,3]]
 md[1]                                             # => [2, 3, [2, 3]]
 md[2]                                             # => [2, 3]
 md.captures
 
 X = Struct.new(:foo)
 pattern = Pattern.new do |p|
   [1, p.capture(:name){ "" }, {1,1}, p.capture(:value){ X.new(1) } ]
 end
 md = pattern.match([3, "this is my name", {1,2}, X.new(2)])
 md[1]                                             # => "this is my name"
 md[2]                                             # => #<struct X foo=2>

 rec_array_pattern = Pattern.new do |p|
   p.capture(:array){ [1, p.capture(:array), 1 ] }
 end

 rec_array_pattern.match([1, 2, 3])                # => nil
 rec_array_pattern.match([1, [], 3])               # => nil
 a = []
 a << 1 << a << 1
 a                                                 # => [1, [...], 1]
 md = rec_array_pattern.match(a)
 md[1]                                             # => [1, [...], 1]
 md[:array]                                        # => [1, [...], 1]
 b = []; b << b
 rec_array_pattern.match([1,b,3])                 # => nil
 
 rec_array_pattern2 = Pattern.new do |p|
   p.capture(:array){ [1, p.capture(:array), p.capture(:last){1} ] }
 end

 md = rec_array_pattern2.match(a)
 md[:last]                                         # => 1
 md[:array]                                        # => [1, [...], 1]

 class Node2
   attr_accessor :children, :value
   def initialize(value, *children)
     @value = value
     @children = children
   end
   def <<(x)
     @children << x
   end
 end

 dg_pattern = Pattern.new do |p|
   p.capture(:toplevel) do 
     Node2.new(p.capture(:name){""}, Node2.new(1, p.capture(:toplevel)) )
   end
 end
 a = Node2.new("foo")
 a << Node2.new(2, a)

 md = dg_pattern.match(a)
 md[:name]                                         # => "foo"
 md[:toplevel]                                     # => #<Node2:0xb7dd091c @children=[#<Node2:0xb7dd0908 @children=[#<Node2:0xb7dd091c ...>], @value=2>], @value="foo">
 md = dg_pattern.match(Node2.new("foo", Node2.new(2)))    # => nil


杭州千斤顶 2006-07-29 (Sat) 01:16:30

时尚充气千斤顶使用方式打破了常规的液压式千斤顶烦琐,我们是生产千斤顶的企业。

Last modified:2005/11/30 09:37:34
Keyword(s):[blog] [ruby] [pattern] [matching] [recursive]
References:[Ruby]

*1 currently limited to p.BOOLEAN and p.capture