Symbols, meta-programming and leaks. On harakiri and DoSing Rails?
You've probably read somewhere that Symbols are never GC'ed in Ruby. And if you've seen this you'll also know that a Symbol takes at least around 60 bytes.
But did you keep that in mind the last time you were doing some meta-programming? Very often, you need to come up with a unique name, and write things like
name = "__prefix_#{rand(1000000)}"
or even, in order to achieve thread-safety,
name = "__prefix_#{rand(1000000)}_#{Thread.current.object_id.abs}"
Then, you would often define a method with that name, as in
class Object def unsuspect(*args, &block) obj = self class << obj; self end.module_eval do name = "__unsuspect_#{rand(10000000)}_#{Thread.current.object_id.abs}" begin define_method(name, &block) obj.send(name, *args) ensure remove_method name end end end end
Do you recognize that snippet? It's a simple Object#instance_exec implementation (I've made a better one, but I'll write some more on this later):
"foo".unsuspect(" bar"){|x| self + x} # => "foo bar"
Now, did you know that the innocent looking #unsuspect method will bring your long-running processes down to their knees?
When you do
define_method(name)
Ruby converts name into a Symbol. Which will never be released. So #unsuspect is leaking memory at the rate of ~60 bytes (at least) per call. The amount of unreclaimed memory will be around 60 bytes times the number of unique names that will be generated. In the above example, it'd be one million times the total number of threads you will run over the life of the program (not even simultaneously!)*1.
Some figures
Let's see a small example: if you perform 1000 calls to #unsuspect per thread, and create new threads at the rate of 200 an hour, you'd be leaking as much as about
1000 * 200 * 60 * 24 # => 288000000
That is 274MB a day*2.
If you're using Rails, you might have noticed that it ships with a simple implementation of #instance_exec (I already commented on its limitations)... Actually, things could be much worse, potentially. It used to be possible to DoS Rails applications quite easily, since the framework was #intern'ing strings carelessly. Given the right requests, you could make the process grow to unbearable sizes. I haven't read the sources for a while, so this might have been fixed.
*1 actually, some object_ids might be reused, but this doesn't really weaken the argument
*2 the actual expression is more complex since rand might return the same result, as is the case for Thread.current.object_id. Actually, as a consequence of the birthday paradox, chances are nearly 50% that rand() will yield the same value twice over the 1000 calls, but at any rate the number of different values stays quite close to 1000 --- even if it were, say, 950, it wouldn't really change anything
Keyword(s):[blog] [ruby] [frontpage] [symbol] [memory] [leak] [intern] [overhead] [rails] [subpar]
References:[The dangers of #undef_method, #instance_exec recalled for memleaking!]