eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

CPAN vs. RAA: costs

I tried to improve my estimate of RAA's cost by running the script shown below against 11% of the archive (by project count); that subset would cost around $20M (600000 lines of code), leaving the total cost of the RAA under $191 million. I then compared it to a revision of the cost of CPAN computed in 2004 which lowers the original estimate substantially.

The final figure is somewhat biased because I didn't pick the projects randomly (so the remainder should be smaller on average), but it still serves as an upper bound.

Comparison with CPAN

The cost of CPAN was estimated to be under $677 million in 2004. That analysis was faulty because it considered all of CPAN as a single project with 15.5 million LOCs, which would inflate the numbers due to the nonlinear effort estimate equation man_months = 2.4 klocs sup 1.05.

The error introduced will be smaller than

{ 2.4 left ( P L right ) sup 1.05 } over {2.4 P L sup 1.05} = P sup 0.05

where P is the number of projects and L the average project size.

Unfortunately, I couldn't find any size statistics for the CPAN, so I just took 5000 as a very conservative estimate of CPAN's size in 2004 (knowing that it's close to 10000 modules now) --- the smaller the number of projects, the less important the bias introduced in the original analysis. Retaining that number, the 2004 result was bloated by at most 1 - 5000 sup 0.05 = 53%, leaving CPAN's cost in 2004 between $442M and $677, depending on the size distribution of CPAN's modules.

RAA's cost in 2006 $ is under $191M --- let's make it $100 million, assuming that the 89% I didn't analyze is smaller on average than the 11% I did consider. Inflation is well under the error margin for CPAN's cost, so there's no need to convert it into 2006-dollars. So the final, quotable result is

CPAN would cost around 5 times more than RAA according to the COCOMO basic model.

The cost of the interpreters

Surprisingly, ruby (including its standard lib) costs more than the corresponding perl distribution: it's $20M vs. $15M, due to Ruby's richer standard library. By the way, since perl is hosted in CPAN and ruby isn't in RAA, those $20M could be added to the $100M used in the above analysis...

Counting lines of code

The original CPAN estimate was done with SLOCCount. I also used it for perl and ruby (the interpreters plus stdlibs themselves), but wrote a small script for the RAA subset:

require 'find'

def stats_for_dir(dname)
  nfiles = lines = 0
  Find.find(dname) do |fname|
    next unless File.file? fname
    next if %w[svn darcs setup.rb install.rb].any?{|x| Regexp.new(Regexp.escape(x)) =~ fname }
    if fname =~ /\.(rb|c|h)$/ or File.open(fname){|f| f.gets =~ /^#!.*ruby/}
      $stderr.puts(" " * 10 + fname)
      nfiles += 1
      File.open(fname){|f| f.each{ lines += 1 } }
    end
  end
  puts "%-35s  %3d  %5d " % [dname, nfiles, lines]
  $stderr.puts "%-35s  %3d  %5d " % [dname, nfiles, lines]
  [nfiles, lines]
end


i = 0
all_stats = {}
ARGF.each do |dname|
  dir = dname.chomp
  #next if (i += 1) > 4
  all_stats[dir] = stats_for_dir(dir)
end

total_files = total_locs = 0
files_sq = locs_sq = 0
all_stats.each do |name, (files, locs)|
  total_files += files
  total_locs += locs
  files_sq += files ** 2
  locs_sq += locs ** 2
end
avg_files, avg_locs = 1.0 * total_files / all_stats.size, 1.0 * total_locs / all_stats.size
stddev_files = Math.sqrt(1.0 * files_sq / all_stats.size - avg_files ** 2)
stddev_locs = Math.sqrt(1.0 * locs_sq / all_stats.size - avg_locs ** 2)

$stderr.puts <<EOF % [total_files, avg_files, stddev_files, total_locs, avg_locs, stddev_locs]


#{all_stats.size} libs/apps analyzed

       Total    Avg        stddev
Files: %-6d   %-6.1f     %-5f
LoCs:  %-6d   %-6d     %-5f
EOF

My stats

Total Avgstddev
Files 4052 26.5 67.588345
LoCs: 607552 3970 8983.547211

Cost estimate:

  • man months 1650.3
  • cost $20511561

Rails is the largest project, and hence the most expensive one, at $2.7M. rb-gsl is (very unexpectedly) quite a close second ($2.2M)...

NameFilesLOCsCostMan months
BlueCloth-1.0.0 9 3958 126471 10.2
FXRuby-1.0.29 449 48168 1743970 140.3
Getopt 19 3474 110284 8.9
Linguistics-1.02 15 6781 222587 17.9
PluginFactory-1.0.0 6 449 12867 1.0
PrettyException-0.9.3 2 1092 32717 2.6
RHDL-0.4.3 23 2274 70676 5.7
RedCloth-3.0.0 4 1403 42565 3.4
Ruby-HashSlice-1.03 2 184 5043 0.4
RubyInline-3.1.0 7 1215 36597 2.9
SpeedReader-0.5 16 1597 48765 3.9
Test-Unit-Mock-0.03 4 1264 38148 3.1
aeditor-1.9 24 10900 366387 29.5
aes 4 931 27671 2.2
amrita-1.0.2 82 11175 376099 30.3
ansicolor-0.0.3 3 177 4841 0.4
archive-tar-minitar-0.5.1 5 2456 76627 6.2
arrayfields-3.4.0 3 692 20265 1.6
aspectr-0-3-5 4 644 18791 1.5
bdb-0.5.4 52 17930 617876 49.7
bdbxml-0.5.2 30 2018 62346 5.0
bitset-0.6.2 5 1746 53553 4.3
bloom 3 233 6461 0.5
borges-1.1.0 157 9953 333038 26.8
breakpoint 6 687 20111 1.6
builder-1.2.2 9 1017 30361 2.4
bz2-0.2.2 7 2563 80136 6.4
cache-0.1.0 1 362 10263 0.8
captcha-0.1.2 4 567 16440 1.3
cast_256 3 549 15892 1.3
cgikit-1.2.1 61 11405 384231 30.9
chun 1 577 16744 1.3
copland-1.0.0 117 11339 381896 30.7
copland-lib-0.1.0 21 1479 44989 3.6
copland-remote-0.1.0 18 1676 51301 4.1
copland-webrick-0.1.0 16 1433 43521 3.5
criteria-1.1a 11 1101 33000 2.7
crosscase 8 1238 37324 3.0
crypt-fog-0.1.0 3 121 3247 0.3
crypt-isaac_0.9 1 165 4497 0.4
cstemplate-0.5.1 2 904 26829 2.2
dbdbd-0.2.2 5 569 16501 1.3
dbus-0.1.10 25 4062 129962 10.5
dev-utils-1.0.1 10 1181 35522 2.9
diff-0.4 7 525 15163 1.2
diff-lcs-1.1.2 11 2950 92887 7.5
directorywatcher 1 245 6811 0.5
djb-netstrings-ruby-0.1.0 2 110 2938 0.2
dpklib-1.0.6 133 8441 280127 22.5
drbfire-0-1-0 4 505 14557 1.2
entryCache-1.1 5 344 9728 0.8
extensions-0.6.0 35 3491 110851 8.9
extmath-2.3 2 1425 43266 3.5
flattenx-0.1.0 3 158 4297 0.3
flexmock-0.0.3 4 260 7250 0.6
formvalidator-0.1.3 9 1595 48701 3.9
fsdb-0.4 29 3430 108818 8.8
gemfinder-1.9.6 16 1364 41323 3.3
gurgitate-mail-1.4.1 7 621 18087 1.5
hobix-0.3 24 3571 113519 9.1
html-parser-19990912p2 4 1098 32905 2.6
htmltokenizer 1 259 7221 0.6
ikko-0.1 1 273 7631 0.6
instiki-0.9.1 33 3746 119368 9.6
interface-0.1.0 6 211 5822 0.5
iowa_0.9.2 49 5336 173068 13.9
iterator-0.8 16 2532 79118 6.4
jabber4r-0.6.0 12 2827 88824 7.1
kansas_0.2 16 2231 69273 5.6
keyedlist 2 311 8750 0.7
kirbybase-1.6 3 1215 36597 2.9
lafcadio-0.4.0 132 6972 229175 18.4
libgnucap-ruby-0.1 3 277 7749 0.6
libxml-0.3.4 56 6493 212671 17.1
lingua-0.5 5 443 12687 1.0
log4r-1.0.5 46 2924 92027 7.4
madeleine-0.6.1 18 3068 96792 7.8
mahoro-0.1 5 425 12146 1.0
math-const-1.0.1 2 275 7690 0.6
metatags-1.0 8 484 13922 1.1
midilib-0.8.3 20 2794 87736 7.1
mime-types-1.13.1 3 1635 49984 4.0
mw-template-0.9.1 13 2269 70512 5.7
narray-0.5.7p4 52 12743 431695 34.7
needle-1.2.0 65 6072 198216 15.9
needle-extras-1.0.0 10 621 18087 1.5
net-sftp-0.5.0 73 5200 168440 13.6
net-ssh-0.6.0 121 14267 486062 39.1
nora-0.0.20041021 46 5842 190340 15.3
objectgraph-1.0.1 2 232 6432 0.5
objectpool-0.2.0 4 306 8603 0.7
patch 0 0 0 0.0
permutation 3 730 21435 1.7
pqa-1.3 2 814 24032 1.9
proclib 2 262 7309 0.6
purple-0.5.1 61 32342 1147883 92.4
racc 16 5051 163376 13.1
raggle-0.3.2 7 5705 185656 14.9
rails-1.0.0 581 72560 2681480 215.7
rake-0.4.15 31 3874 123654 9.9
rb-gsl-1.5.2 332 58838 2151707 173.1
rb2html-1.1 7 707 20726 1.7
rbmhshow-0.4.1 16 2389 74433 6.0
rbprof 2 578 16775 1.3
rbtree-0.1.2 5 3865 123352 9.9
rcov-0.2.0 3 1805 55455 4.5
regexp-engine-0.12 30 8125 269126 21.7
rgl-0.2.2 25 3089 97488 7.8
rice-0.0.0.2 18 2147 66537 5.4
rlimit-1.0 3 117 3135 0.3
rubilicious-0.1.0 4 618 17996 1.4
ruby-aes-1.8.0 7 1064 31836 2.6
ruby-bsearch-1.5 3 202 5562 0.4
ruby-crypt-random-1.3 4 423 12086 1.0
ruby-dict-0.9.2 2 870 25771 2.1
ruby-gettext-package-0.8.0 34 2197 68165 5.5
ruby-goto 2 68 1773 0.1
ruby-htmltools 15 2458 76692 6.2
ruby-libneural 6 517 14921 1.2
ruby-progressbar-0.8 2 267 7455 0.6
ruby-romkan-0.4 2 364 10322 0.8
ruby-termios-0.9.4 8 1216 36628 2.9
rubymail-0.17 24 7828 258806 20.8
rubypants-0.2.0 2 652 19037 1.5
rubywebdialogs 15 7557 249407 20.1
rubyzip-0.5.5 18 7029 231142 18.6
runt-0.2.0 12 1571 47932 3.9
ruvi-0.4.12 34 10680 358626 28.9
ruwiki-0.9.0 44 8290 274868 22.1
sds-0.3 15 3572 113553 9.1
session-2.1.9 9 1654 50594 4.1
simplemail-0.3 4 610 17751 1.4
snmp-0.3.0 18 2806 88132 7.1
sqlite-ruby-2.2.2 27 6298 205970 16.6
statistics-020920 2 292 8190 0.7
stream-0.5 7 967 28796 2.3
sympop-0.9.1 1 78 2048 0.2
sys-host-0.5.0 9 804 23722 1.9
sys-proctable-0.6.4 27 3293 104259 8.4
sys-uptime-0.4.0 7 482 13862 1.1
test-report-0.3.0 8 1111 33315 2.7
tex-hyphen-0.2 3 4974 160762 12.9
text-format-0.64 6 3080 97189 7.8
tldlib 5 553 16014 1.3
tmail-0.10.8 43 10307 345486 27.8
types 2 1988 61373 4.9
webfetcher-0.5.5 2 1375 41673 3.4
webgen-0.2.0 29 3037 95765 7.7
webunit 61 6566 215183 17.3
xhtmldiff-1.2.1 2 208 5735 0.5
xmlresume2x-0.2.1 5 495 14255 1.1

Last modified:2006/03/24 06:02:33
Keyword(s):[blog] [ruby] [raa] [cpan] [cost] [estimate] [cocomo] [subpar] [frontpage]
References:[Ruby]