eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Call graphs to analyze code dependencies, or just because.

Here's one of the intermediate results on the path to extending the rcov code coverage tool for Ruby: recording call graph information, so we can analyze the dependencies between our modules. The following figure is quite expressive:

/hiki/callgraphs/sample_call_graph.png

The diagram tells at once what's happening below:

class Bar
  def foo
    1+1
  end

  def bar(x)
    10.times{|i| x.bar(i) }
  end
end

class Test
  def initialize
    @bar = Bar.new
  end

  def test
    a = 1
    b = 2
  
    @bar.foo
  end

  def test2
    @bar.bar(self)
  end

  def bar(x)
    if x > 5
      @bar.foo
    else
      test
    end
  end
end

t = Test.new
t.test
t.test2

Recording the call graph

This is fairly easy to implement: set_trace_func can be used to record the appropriate events (call, c-call, return and c-return). We just need to keep track of the caller when we return from a method. The basic structure would look like

deps = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = [] } }
call_stack = []

set_trace_func lambda { |event, file, line, id, binding, classname|
  case event
  when 'call', 'c-call'
    call_stack << [classname.to_s, id]
  when 'return', 'c-return'
    curr_class, curr_meth = call_stack.pop
    prev_class, prev_meth = call_stack.last || ["TOP", "TOP"]
    # slow, don't care
    deps[prev_class][prev_meth] |= [[curr_class, curr_meth]]
  end
}

Refinement

There are several problems in the naïve code shown above:

  • set_trace_func is slow and we can only have one per process
  • the call graph is huge because it records lots of uninteresting calls like Fixnum#+
  • (related to the latter) the call graph can be confusing because we'll have lots of calls originating in methods taking a block, like Enumerable#each. Also, it'd be better to skip methods like Class.new so we don't lose control flow information when different paths coalesce.

There's only one way to address the first issue: using the event_hook subsystem of the (C) API, the same way rcov's rcovrt does to achieve the 200X speedup compared to other code coverage tools. I'll end up doing that once I decide how to add this all to rcov.

As for the two others, they can be solved in pure Ruby: method calls from uninteresting classes can be ignored if you are careful enough to ensure the stack doesn't underflow.

Once the code we're interested in has been executed, the call graph information will be stored in a structure resembling

{"Test"=>
  {:bar=>[["Test", :test], ["Bar", :foo]],
   :test2=>[["Bar", :bar]],
   :test=>[["Bar", :foo]]},
 "Bar"=>{:bar=>[["Test", :bar]]},
 "TOP"=>{""=>[["Test", :initialize], ["Test", :test], ["Test", :test2]]}}

The interpretation is quite simple: Test#bar calls Test#test and Bar#foo, Test#test2 calls Bar#bar, etc...

It should be rather easy to process that data further to merge methods across a functional unit or extract interaction patterns, but for the time being a mere graphical representation is OK.

Code

You can use the following snippet as follows:

ruby -record-call-graph whatever.rb

will create an OUTPUT.dot file that can be processed with Graphviz; that information will also be dumped in textual form to stderr.

ecord-call-graph.rb

# Copyright (c) 2006 Mauricio Fernandez <mfp@acm.org>
# Use and modification subject to Ruby's license.

deps = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = [] } }
call_stack = [["TOP", "", 2**30]]
IGNORE = %w[Array Bignum Comparable Config Class Dir Enumerable ENV Fixnum Float Hash 
                    IO Integer Kernel OptionParser Module Object Range Regexp String 
                    Time].map{|x| Regexp.new(x)} + [/REXML::/]
set_trace_func lambda { |event, file, line, id, binding, classname|
  case event
  when 'call', 'c-call'
    if IGNORE.any?{|re| re =~ classname.to_s }
      call_stack.last[2] += 1
    else
      call_stack << [classname.to_s, id, 1]
    end
  when 'return', 'c-return'
    curr_class, curr_meth, counter = call_stack.last
    if counter == 1
      call_stack.pop
      prev_class, prev_meth, = call_stack.last
    # slow, don't care
      deps[prev_class][prev_meth] |= [[curr_class, curr_meth]]
    else
      call_stack.last[2] -= 1
    end
  end
}

label_escape = lambda do |x|
  s = x.to_s
  pairs = {/\|/, '\|', /\</, '\<', /\>/, '\>'}
  pairs.each{|pre, post| s.gsub!(pre, post)}

  s
end

END{ 
  set_trace_func(nil)
  File.open("OUTPUT.dot", "w") do |f| 
    f.puts <<EOF
digraph call_graph {
    ratio = fill;
    edge [weight = 3];
EOF
    all_classes = Hash.new{|h,k| h[k] = {} }
    deps.each do |klass, src_meths|
      src_meths.each do |src_meth, dest_meths| 
        all_classes[klass][src_meth] = true
        dest_meths.each do |dst_class, dst_meth|
          all_classes[dst_class][dst_meth] = true
        end
      end
    end
    all_classes.each do |klass, meths|
      f.puts %!subgraph "cluster_#{klass}" { 
               fontcolor = red; color = blue; label = "#{label_escape[klass]}";!
      meths.keys.each do |m| 
        label = label_escape[m]
        f.puts %!"#{klass}##{m}"; ! # [label="#{label}"];  ! 
      end
      f.puts "}"
    end
    deps.each do |src_class, meth_map|
      meth_map.each do |src_meth, dest_map|
        dest_map.each do |dest_class, dest_meth|
          f.puts %!"#{src_class}##{src_meth}" -> "#{dest_class}##{dest_meth}"; !
        end
      end
    end

    f.puts "}"

    require 'pp'
    PP.pp(deps, $stderr)
  end
}

I have included a number of classes to be ignored in the IGNORE array, but the list is far from exhaustive: anyway, some more work will be needed to ensure the call graph is reasonable (coalescing, dynamic tracing activation, etc.).


Last modified:2006/03/14 09:33:15
Keyword(s):[blog] [ruby] [call] [graph] [rcov] [set_trace_func]
References: