More about naming conventions
I patched ruby 1.8.4 to gather some statistics regarding naming conventions, as far as identifiers are concerned. My local build will now emit such information at parse time as follows:
$ ruby -t -c -e "def a(x); c = x.size; def x.bar; 1 end; c + 1 end" [tokinfo] method def: a [tokinfo] tIDENTIFIER assignment for lvar: c [tokinfo] singleton method def: bar Syntax OK
These are the statistics for Ruby 1.8.4's stdlib:
| type | average length | std dev | count |
|---|---|---|---|
| symbol | 7.0 | 4.3 | 3214 |
| ivar | 8.6 | 3.9 | 1915 |
| smethod | 9.6 | 4.9 | 1001 |
| gvar | 9.1 | 4.3 | 377 |
| cname | 8.3 | 4.8 | 2124 |
| cvar | 11.4 | 4.4 | 103 |
| dvar_curr | 3.5 | 2.7 | 1158 |
| constant | 10.9 | 5.0 | 1660 |
| dvar | 4.7 | 3.7 | 49 |
| method | 10.5 | 5.6 | 7566 |
| lvar | 4.3 | 2.9 | 3427 |
I just added some code to the parser and some support functions in order to capture:
- uses of symbols (symbol)
- method definitions (method)
- singleton method definitions in the form def object.meth (smethod)
- class/module names (cname)
- plain constant names (constant)
- block-local variables assigned to inside a nested block (dvar)
- block-local variables (dvar_curr)
- local variables (lvar)
- instance variables (ivar)
- global variables (gvar)
- class variables (cvar)
The difference between dvar and dvar_curr is fairly subtle; in the following example, x is a dvar_curr:
%w[a b c].each{|x| puts x * 2}
"curr" means that the variable was defined in the current block.
When a block reuses a variable defined in an enclosing block, it is considered a dvar:
%w[aasdad dfdsb sdfsfc].map do |x| max = 0 x.each_byte{|b| max = b if b > max } max end
Here max, as seen from the block passed to each_byte, is a dvar.
Comparison across Ruby versions
The average identifier length seems to have increased a bit or remained mostly stable for most categories. This increase can in part be attributed to the growth of the standard library: more specific libraries often require longer names (for more complex concepts).
Average constant name length
Average method name length
Some code
As usual, here's the code I used to generate the stats. I guess the most interesting bits would be
allstats.each do |key, val| tlength, tlength2, ntoks = val.inject([0,0,0]) do |(s,s2,t),(name,count)| [s + name.size * count, s2 + (name.size ** 2 * count), t + count] end avg = 1.0 * tlength / ntoks stddev = Math.sqrt(1.0 * tlength2 / ntoks - avg ** 2) ret[key] = [avg, stddev, val.size] end
def_proc = lambda{|h,k| h[k] = 0} stats = {} MATCHER.each do |key, val| stats[key] = Hash.new(&def_proc) end
Full code
MATCHER = { :symbol => ["symbol: (\\S+)"], :method => ["method def: (\\S+)"], :smethod => ["singleton method def: (\\S+)"], :cname => ["tCONSTANT for cname: (\\S+)"], :dvar => ["tIDENTIFIER assignment for dvar: (\\S+)"], :dvar_curr => ["tIDENTIFIER assignment for dvar_curr: (\\S+)"], :lvar => ["tIDENTIFIER assignment for lvar: (\\S+)"], :ivar => ["tIDENTIFIER assignment for ivar: (\\S+)"], :gvar => ["tIDENTIFIER assignment for gvar: (\\S+)"], :cvar => ["tIDENTIFIER assignment for cvar: (\\S+)"], :constant => ["tIDENTIFIER assignment for constant: (\\S+)"] } def naming_statistics(filename) def_proc = lambda{|h,k| h[k] = 0} stats = {} MATCHER.each do |key, val| stats[key] = Hash.new(&def_proc) end IO.popen("ruby -t -c #{filename} 2>&1") do |f| f.each do |line| next unless /^\[tokinfo\] / =~ line MATCHER.each do |key, re_arr| md = nil if re_arr.any?{|x| md = Regexp.new("^\\[tokinfo\\] #{x}").match(line) } stats[key][md[1]] += 1 break end end end end stats end require 'find' def dir_stats(dirname) allstats = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = 0} } Dir.chdir(dirname) do Find.find(".") do |fname| next unless /\.rb$/ =~ fname stats = naming_statistics(fname) stats.each_pair{|k,v| v.each_pair{|name,count| allstats[k][name] += count} } end end ret = {} allstats.each do |key, val| tlength, tlength2, ntoks = val.inject([0,0,0]) do |(s,s2,t),(name,count)| [s + name.size * count, s2 + (name.size ** 2 * count), t + count] end avg = 1.0 * tlength / ntoks stddev = Math.sqrt(1.0 * tlength2 / ntoks - avg ** 2) ret[key] = [avg, stddev, val.size] end ret end %w[/tmp/ruby-1.4.6/ /tmp/ruby-1.6.8 /tmp/ruby-1.8.0 /tmp/ruby-1.8.4].each do |dirname| stats = dir_stats(dirname) puts "=" * 20 puts dirname puts "=" * 20 stats.each_pair do |k, r| puts "%10s %4.1f %4.1f (%5d)" % [k, *r] end end
Keyword(s):[blog] [ruby] [naming] [convention] [1.8.4]
References:[Ruby]