A small FS in DATA and a pure Ruby compiler (in the classical sense)
DATA is one of those features one rarely sees in use, but it can be quite handy at times. I used it in rcov to include the xx markup generation library while ensuring the rcov executable remained self-contained (the extension is optional).
I've written a simple FS meant to be used with DATA, in order to structure it into individually accessible files. I then used it to implement a very simplistic pure Ruby compiler (in the sense of composing a .rb file out of many, i.e. a compiler as the very first ones, before the term started to be misused).
A small FS for the DATA section
I could have used minitar to create POSIX tar files in DATA, but I'd have had to implement random access on top of it, so I just defined a feeble YAML-based format:
#!/usr/bin/env ruby # ... # this is the .rb file 1 + 1 __END__ <length of the toc> <YAML-serialized toc (obvious from the code)> data for all the files in the DataFS just one after the other
Creating the FS
The utterly simplistic API for the Writer class is
datafs = DataFS::Writer.new datafs.add("filename", "file contents") datafs.add("whatever.rb", "puts 1") datafs.dump(someIO) # dump to someIO puts datafs.dump # just return the serialized representation
The implementation is trivial; an array of FStat objects (holding name, content length and position in the DATA stream) for the embedded files is serialized with YAML and used as the TOC:
FStat = Struct.new(:name, :size, :offset) class Writer def initialize @files = {} end def add(filename, contents) @files[filename] = contents end def dump(anIO = nil) unless anIO ret_content = true anIO = StringIO.new("") end offset = 0 index = {} @files.keys.sort.each do |name| contents = @files[name] index[name] = FStat.new(name, contents.size, offset) offset += contents.size end serialized_index = YAML.dump(index) anIO.puts(serialized_index.size) anIO.write(serialized_index) @files.keys.sort.each{|name| anIO.write(@files[name]) } if ret_content anIO.string else anIO end end end
Reading
Reading is a tiny bit harder; the basic API looks like
datafs = DataFS::Reader.new(DATA) datafs.open("blergh.dat") do |f| #... f.read(10) # also defined: f.eof? and f.rewind, but no other IO goodies #... end
When a DataFS file is open()ed, a FileStream object representing a bounded section of the DATA area is returned/yielded. FileStreams respond to #eof?, #write and #rewind, and are implemented with some care so that you only get the data from the corresponding DataFS file (and not from the following ones, after you get to EOF).
class Reader def initialize(io) @io = io idx_size = @io.gets @index = YAML.load(@io.read(idx_size.to_i)) @initial_pos = io.pos end def fstat(filename) @index[filename] end def open(filename) raise Errno::ENOENT unless fstat = @index[filename] file_entry = FileStream.new(@io, @initial_pos + fstat.offset, fstat.size) if block_given? yield file_entry else return file_entry end end class FileStream def initialize(io, offset, size) @io = io.dup @offset = offset @size = size @pos = 0 rewind end def eof?; @pos == @size end def rewind @io.pos = @offset @pos = 0 end def read(size = @size) ret = @io.read([@size - @pos, size].min) @pos += ret.size if ret ret end end end
How to require() from the DataFS
Loading code contained in the DataFS just involves a call to Kernel#eval, but reproducing Kernel#require's semantics takes a few more lines:
module Kernel DATAFS = DataFS::Reader.new(DATA) alias_method :__pre_datafs_require, :require def require(name, *args, &b) if ["", ".rb"].include? File.extname(name) # very naïf, 1.9 issues, etc. return false if $".include?(name) || $".include?(name + ".rb") try_and_load = lambda do |n| DATAFS.fstat(n) and (eval(DATAFS.open(n).read, TOPLEVEL_BINDING, n) || true) and $" << n end return true if try_and_load[name] || try_and_load[name + ".rb"] end __pre_datafs_require(name, *args, &b) end end
This is just a quick hack so it could be improved a fair bit.
Compiling pure-Ruby scripts
It's somewhat unfortunate that the word compile has been taken to mean something different, so you could s/compile/compose/g or s/compile/assemble/g in the above header...
The basic idea is:
- creating a DataFS serialization with the desired .rb files
- appending the datafs_require magic (inside a BEGIN block, so it gets executed first)
- dumping the DataFS representation to the destination DATA area
Dependency auto-discovery would be fairly easy to implement, but I wanted to control which files get embedded (no need to redistribute the stdlib normally), so I just structured the command line as in
ruby compose.rb main.rb lib.rb anotherlib.rb=foo.rb > main2.rb
which would put lib.rb and anotherlib.rb (renamed to foo.rb) in the DataFS contained in main2.rb.
Example
This is our main file:
require 'thelib' foo
And this the library to be embedded:
require 'pp' def foo puts "YES THIS IS THE foo! -> #{__FILE__}" pp caller end
I just run
ruby compose.rb bla.rb thelib.rb > bla2.rb
so that bla2.rb will include thelib.rb, and require 'thelib' will load the file from the DataFS.
Code
A simple script containing all the above code: compose.rb
kinda like darb... - vjoel (2006-04-29 (Sat) 14:23:07)
Your Data FS is much nicer, but there is a similar idea in http://raa.ruby-lang.org/project/darb/.
One question: is it correct to use TOPLEVEL_BINDING? That means that required files can see (and can affect) local variables in the file they were required from. IIUC, that is not the way Kernel#require works, and darb follows that semantics.
I wish there were an easy way to implement autorequire, without using const_missing.... Any ideas?
vjoel
mfp 2006-04-29 (Sat) 15:17:36
Hey, /me didn't know about darb :) Yes, your def empty_binding; binding; end looks much better. As for autoload, the one way to implement it w/o const_missing (using proxies that load the lib when they are sent a message) I can think of right now would be much more fragile.
vjoel 2006-04-29 (Sat) 16:05:26
Also, I'm wondering why you went with alias_method instead of http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/151855. It was the latter technique that I used in darb (I even put "thanks Batsman" in the code!). I can't recall for sure what the problem with using an alias would be. By the way, is there some reason my comments in this blog interface are limited to 1 line? When I hit <enter>, the comment is posted :(
mfp 2006-04-30 (Sun) 01:44:59
The problem with the
old = instance_method(:foo)
define_method(:foo){|*args| ... old.bind(self).call(*args) }
idiom is that you cannot propagate blocks under 1.8 (it is possible under 1.9, where blocks accept a block arg). In the above code, I used
__pre_datafs_require(name, *args, &b)
so that new Kernel#require definitions that use blocks (there's at least one possible use for this) still work.
Regarding the comment interface: whereas top-level comments ("new threads") use a textarea, I limited replies to a much smaller inputbox. I thought this would make discussions more lively. And I ... was wrong, so I'm changing that right now :)
mfp 2006-04-30 (Sun) 02:18:23
Alright, a 60x2 textarea doesn't look too bad. I could also make that 60x1 to enable multi-line replies while encouraging short, to-the-point comments.
Keyword(s):[blog] [ruby] [data] [DATA] [compiler] [fs] [filesystem] [datafs] [snippet] [frontpage]
References:[Ruby]