eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

More about naming conventions

I patched ruby 1.8.4 to gather some statistics regarding naming conventions, as far as identifiers are concerned. My local build will now emit such information at parse time as follows:

 $ ruby -t -c -e "def a(x); c = x.size; def x.bar; 1 end; c + 1 end"
 [tokinfo] method def: a
 [tokinfo] tIDENTIFIER assignment for lvar: c
 [tokinfo] singleton method def: bar
 Syntax OK

These are the statistics for Ruby 1.8.4's stdlib:

typeaverage lengthstd devcount
symbol7.04.33214
ivar8.63.91915
smethod9.64.91001
gvar9.14.3377
cname8.34.82124
cvar11.44.4103
dvar_curr3.52.71158
constant10.95.01660
dvar4.73.749
method10.55.67566
lvar4.32.93427

I just added some code to the parser and some support functions in order to capture:

  • uses of symbols (symbol)
  • method definitions (method)
  • singleton method definitions in the form def object.meth (smethod)
  • class/module names (cname)
  • plain constant names (constant)
  • block-local variables assigned to inside a nested block (dvar)
  • block-local variables (dvar_curr)
  • local variables (lvar)
  • instance variables (ivar)
  • global variables (gvar)
  • class variables (cvar)

The difference between dvar and dvar_curr is fairly subtle; in the following example, x is a dvar_curr:

%w[a b c].each{|x| puts x * 2}

"curr" means that the variable was defined in the current block.

When a block reuses a variable defined in an enclosing block, it is considered a dvar:

%w[aasdad dfdsb sdfsfc].map do |x|
  max = 0
  x.each_byte{|b| max = b if b > max }
  max
end

Here max, as seen from the block passed to each_byte, is a dvar.

Comparison across Ruby versions


The average identifier length seems to have increased a bit or remained mostly stable for most categories. This increase can in part be attributed to the growth of the standard library: more specific libraries often require longer names (for more complex concepts).

Average constant name length
/hiki/Rubynamingconventions/constant.png
Average method name length
/hiki/Rubynamingconventions/method.png

Some code

As usual, here's the code I used to generate the stats. I guess the most interesting bits would be

  allstats.each do |key, val|
    tlength, tlength2, ntoks = val.inject([0,0,0]) do |(s,s2,t),(name,count)|
      [s + name.size * count, s2 + (name.size ** 2 * count), t + count]
    end
    avg = 1.0 * tlength / ntoks
    stddev = Math.sqrt(1.0 * tlength2 / ntoks - avg ** 2)
    ret[key] = [avg, stddev, val.size]
  end

  def_proc = lambda{|h,k| h[k] = 0}
  stats = {}
   
  MATCHER.each do |key, val|
    stats[key] = Hash.new(&def_proc)
  end

Full code

MATCHER = {
  :symbol => ["symbol: (\\S+)"],
  :method => ["method def: (\\S+)"],
  :smethod => ["singleton method def: (\\S+)"],
  :cname => ["tCONSTANT for cname: (\\S+)"],
  :dvar => ["tIDENTIFIER assignment for dvar: (\\S+)"],
  :dvar_curr => ["tIDENTIFIER assignment for dvar_curr: (\\S+)"],
  :lvar => ["tIDENTIFIER assignment for lvar: (\\S+)"],
  :ivar => ["tIDENTIFIER assignment for ivar: (\\S+)"], 
  :gvar => ["tIDENTIFIER assignment for gvar: (\\S+)"],
  :cvar => ["tIDENTIFIER assignment for cvar: (\\S+)"],
  :constant => ["tIDENTIFIER assignment for constant: (\\S+)"] 
}
def naming_statistics(filename)
  def_proc = lambda{|h,k| h[k] = 0}
  stats = {}
   
  MATCHER.each do |key, val|
    stats[key] = Hash.new(&def_proc)
  end
  IO.popen("ruby -t -c #{filename} 2>&1") do |f|
    f.each do |line|
      next unless /^\[tokinfo\] / =~ line
      MATCHER.each do |key, re_arr|
        md = nil
        if re_arr.any?{|x| md = Regexp.new("^\\[tokinfo\\] #{x}").match(line) }
          stats[key][md[1]] += 1
          break
        end
      end
    end
  end
  stats
end

require 'find'
def dir_stats(dirname)
  allstats = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = 0} }
  Dir.chdir(dirname) do
    Find.find(".") do |fname|
      next unless /\.rb$/ =~ fname
      stats = naming_statistics(fname)
      stats.each_pair{|k,v| v.each_pair{|name,count| allstats[k][name] += count} }
    end
  end
  ret = {}
  allstats.each do |key, val|
    tlength, tlength2, ntoks = val.inject([0,0,0]) do |(s,s2,t),(name,count)|
      [s + name.size * count, s2 + (name.size ** 2 * count), t + count]
    end
    avg = 1.0 * tlength / ntoks
    stddev = Math.sqrt(1.0 * tlength2 / ntoks - avg ** 2)
    ret[key] = [avg, stddev, val.size]
  end

  ret
end

%w[/tmp/ruby-1.4.6/ /tmp/ruby-1.6.8 /tmp/ruby-1.8.0 /tmp/ruby-1.8.4].each do |dirname|
  stats = dir_stats(dirname)
  puts "=" * 20
  puts dirname
  puts "=" * 20
  stats.each_pair do |k, r|
    puts "%10s  %4.1f  %4.1f  (%5d)" % [k, *r]
  end
end


Last modified:2005/12/25 07:01:40
Keyword(s):[blog] [ruby] [naming] [convention] [1.8.4]
References:[Ruby]