eigenclass logo
MAIN  Index  Search  Changes  PageRank  Login

Symbols, meta-programming and leaks. On harakiri and DoSing Rails?

You've probably read somewhere that Symbols are never GC'ed in Ruby. And if you've seen this you'll also know that a Symbol takes at least around 60 bytes.

But did you keep that in mind the last time you were doing some meta-programming? Very often, you need to come up with a unique name, and write things like

  name = "__prefix_#{rand(1000000)}"

or even, in order to achieve thread-safety,

  name = "__prefix_#{rand(1000000)}_#{Thread.current.object_id.abs}"

Then, you would often define a method with that name, as in

class Object
  def unsuspect(*args, &block)
    obj = self
    class << obj; self end.module_eval do
      name = "__unsuspect_#{rand(10000000)}_#{Thread.current.object_id.abs}"
      begin
        define_method(name, &block)
        obj.send(name, *args)
      ensure
        remove_method name
      end
    end
  end
end

Do you recognize that snippet? It's a simple Object#instance_exec implementation (I've made a better one, but I'll write some more on this later):

"foo".unsuspect(" bar"){|x| self + x}              # => "foo bar"

Now, did you know that the innocent looking #unsuspect method will bring your long-running processes down to their knees?

When you do

 define_method(name)

Ruby converts name into a Symbol. Which will never be released. So #unsuspect is leaking memory at the rate of ~60 bytes (at least) per call. The amount of unreclaimed memory will be around 60 bytes times the number of unique names that will be generated. In the above example, it'd be one million times the total number of threads you will run over the life of the program (not even simultaneously!)*1.

Some figures

Let's see a small example: if you perform 1000 calls to #unsuspect per thread, and create new threads at the rate of 200 an hour, you'd be leaking as much as about

 1000 * 200 * 60 * 24                             # => 288000000

That is 274MB a day*2.

If you're using Rails, you might have noticed that it ships with a simple implementation of #instance_exec (I already commented on its limitations)... Actually, things could be much worse, potentially. It used to be possible to DoS Rails applications quite easily, since the framework was #intern'ing strings carelessly. Given the right requests, you could make the process grow to unbearable sizes. I haven't read the sources for a while, so this might have been fixed.


Last modified:2006/07/09 06:18:59
Keyword(s):[blog] [ruby] [frontpage] [symbol] [memory] [leak] [intern] [overhead] [rails] [subpar]
References:[The dangers of #undef_method, #instance_exec recalled for memleaking!]

*1 actually, some object_ids might be reused, but this doesn't really weaken the argument

*2 the actual expression is more complex since rand might return the same result, as is the case for Thread.current.object_id. Actually, as a consequence of the birthday paradox, chances are nearly 50% that rand() will yield the same value twice over the 1000 calls, but at any rate the number of different values stays quite close to 1000 --- even if it were, say, 950, it wouldn't really change anything