Cheapest rsync replacement (yep, with Ruby)
I often use rsync to keep a local copy of some HTTPD logs (around ~200MB atm.). Since they are append-only, having rsync compute and compare the checksums for the parts I already have seems wasteful: both my box and the one I'm copying from would be happier if they didn't have to process a couple hundred MBs for nothing.
I couldn't find any relevant option in rsync's manpage, but google returned the following:
- Added the --append option that makes rsync append data onto files that are longer on the source than the destination (this includes new files).
That looks good, but...
NEWS for rsync 2.6.7 (UNRELEASED)
At that point, I realized that writing what I wanted in Ruby would be faster than looking for it. Or I just wanted to believe that: I didn't feel like scanning the manpage again.
"Get cheap"
It doesn't get much cheaper than the following script, but my HD works less and the remote host should be happier too. I could have used dd instead of ruby but it's easier to add flow control (without depending on TCP's) with the latter, in case somebody feels like it.
#!/usr/bin/env ruby REMOTE_RUBY = "ruby" # TODO: allow REMOTE_RUBY to be specified via a cmdline opt if ARGV.size != 2 || ARGV[0][/:/].nil? || !File.exist?(ARGV[1]) puts <<EOF ruby logfetcher.rb host:path/to/src dst EOF exit end FILE = ARGV[1] REMOTE_HOST, REMOTE_FILE = ARGV[0].split(/:/) BLOCK_SIZE = 8192 osize = File.size(FILE) #FIXME: cheap escaping command = "File.open(#{REMOTE_FILE.inspect}){|f| " + "f.pos = #{osize}; print f.read(#{BLOCK_SIZE}) until f.eof? }" command.gsub!(/"/){'\\"'} fetched = 0 t = nil $stdout.sync = true print "Establishing connection\r" File.open(FILE, "a") do |os| IO.popen(%{ssh #{REMOTE_HOST} ruby -e '"#{command}"'}) do |is| until is.eof? data = is.read(BLOCK_SIZE) t ||= Time.new # ignore the time it takes to establish the SSH connection fetched += data.size print "Read #{fetched} \r" os.write(data) end end end print(" " * 50 + "\r") dt = Time.new - t puts "Fetched #{fetched} bytes." puts "Total size #{osize + fetched}." puts "Needed %4.1f seconds." % dt puts "Average speed %d bytes/sec." % (fetched / dt)
- 67 http://www.artima.com/forums/flat.jsp?forum=123&thread=150992
- 44 http://snippets.dzone.com/posts/show/1812
- 16 http://planetruby.0x42.net
- 13 http://www.bigbold.com/snippets/posts/show/1812
- 10 http://www.artima.com/buzz/community.jsp?forum=123
- 8 http://chneukirchen.org/anarchaia
- 8 http://www.anarchaia.org
- 6 http://anarchaia.org
- 6 http://rubyriver.org
- 5 http://snippets.dzone.com/tag/rsync
Keyword(s):[blog] [ruby] [rsync] [subpar]
References:[Ruby] [Persistent URLs: really easy (thank you open-uri, SOAP4R, Ruby)] [Estimating how many people are subscribed to my RSS feeds]