eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Cheapest rsync replacement (yep, with Ruby)

I often use rsync to keep a local copy of some HTTPD logs (around ~200MB atm.). Since they are append-only, having rsync compute and compare the checksums for the parts I already have seems wasteful: both my box and the one I'm copying from would be happier if they didn't have to process a couple hundred MBs for nothing.

I couldn't find any relevant option in rsync's manpage, but google returned the following:

- Added the --append option that makes rsync append data onto files that
  are longer on the source than the destination (this includes new files).

That looks good, but...

NEWS for rsync 2.6.7 (UNRELEASED)

At that point, I realized that writing what I wanted in Ruby would be faster than looking for it. Or I just wanted to believe that: I didn't feel like scanning the manpage again.

"Get cheap"

It doesn't get much cheaper than the following script, but my HD works less and the remote host should be happier too. I could have used dd instead of ruby but it's easier to add flow control (without depending on TCP's) with the latter, in case somebody feels like it.

#!/usr/bin/env ruby

REMOTE_RUBY = "ruby"
# TODO: allow REMOTE_RUBY to be specified via a cmdline opt

if ARGV.size != 2 || ARGV[0][/:/].nil? || !File.exist?(ARGV[1])
  puts <<EOF
  ruby logfetcher.rb host:path/to/src dst
EOF
  exit
end

FILE = ARGV[1]
REMOTE_HOST, REMOTE_FILE = ARGV[0].split(/:/)
BLOCK_SIZE = 8192

osize = File.size(FILE)
#FIXME: cheap escaping
command = "File.open(#{REMOTE_FILE.inspect}){|f| " + 
          "f.pos = #{osize}; print f.read(#{BLOCK_SIZE}) until f.eof? }"

command.gsub!(/"/){'\\"'}
fetched = 0
t = nil
$stdout.sync = true
print "Establishing connection\r"
File.open(FILE, "a") do |os|
  IO.popen(%{ssh #{REMOTE_HOST} ruby -e '"#{command}"'}) do |is|
    until is.eof?
      data = is.read(BLOCK_SIZE)
      t ||= Time.new # ignore the time it takes to establish the SSH connection
      fetched += data.size
      print "Read #{fetched}                          \r"
      os.write(data)
    end
  end
end
print(" " * 50  + "\r")

dt = Time.new - t
puts "Fetched #{fetched} bytes."
puts "Total size #{osize + fetched}."
puts "Needed %4.1f seconds." % dt
puts "Average speed %d bytes/sec." % (fetched / dt)