Ruby internals: a self-study guide to the sources
I want to read Ruby's sources, which order is best?
I've been answering to that question a few times a year, sometimes on ruby-talk, and as of late responding to private emails. The last time I took the extra effort to draft a self-study guide to ruby's internals. Here's a reformatted version.
- Representing Ruby objects, object model basics
- Internal representation of Ruby objects
- Keeping track of Ruby objects
- Object instantiation
- (optional diversion) Internal hash tables used by the functions we'll see later on
- Method dispatching, method cache
- Singleton classes
- Mixins
- Adding methods
- Instance variables
- Evaluating Ruby code
- Core classes
- Harder stuff
Representing Ruby objects, object model basics
Internal representation of Ruby objects
- ruby.h
- RBasic, Robject, RClass, RFloat, RString, RArray, RRegexp, RHash, RFile, RData and RBignum structs
Keeping track of Ruby objects
- gc.c
- struct RVALUE and rb_newobj(), struct heaps_slot
Object instantiation
- ruby.h
- OBJSETUP
(optional diversion) Internal hash tables used by the functions we'll see later on
- st.h (st.c if you want to see the implementation)
Method dispatching, method cache
- eval.c
- rb_call(), struct cache_entry, search_method(), rb_get_method_body
Of special interest is search_method(), which performs method lookup, as you can see how it moves up the class hierarchy (klass chain).
Singleton classes
- class.c
- rb_singleton_class()
How singleton classes are inserted in the klass chain.
Mixins
- class.c
- rb_include_module(), include_class_new()
the meaning of ICLASSes (proxy classes).
Adding methods
- eval.c
- rb_add_method()
how methods are added to the m_tbl table of a klass
Instance variables
- variable.c
- rb_ivar_set(), rb_ivar_get()
how instance variables are stored
At this point, you can read object.c and variable.c to understand most of the object model, peeking into eval.c as needed for the functions you'll see referenced there.
Evaluating Ruby code
Basic nodes
- node.h
- struct RNode/NODE, enum node_type
- eval.c
- quick look at rb_eval()
This is the core of the interpreter. Some branches of the big switch statement you can read to get the gist (chosen because they rely on concepts learned before if you followed this guide):
- NODE_TRUE/NODE_FALSE
- NODE_IVAR
- NODE_ISET
- NODE_IF
- NODE_SCLASS
- NODE_DEFN
More complex nodes
- eval.c
- rb_eval()
Further study of the interpreter taking rb_eval as the starting point Some easy NODEs to begin with:
- NODE_CLASS: defining new classes
- NODE_LASGN, NODE_GASGN, NODE_DASGN, NODE CVAR, NODE_CONST: locals, globals, dynamic variables, class variables, constants...
The hardest ones are those that handle exceptions and blocks.
Core classes
You can read for instance array.c, hash.c and string.c to see how the core classes are implemented.
Take the class you like, scroll down to the Init_xxx() function and locate the C function that implements the method you want to study. No particular order required.
Harder stuff
More complex last.
The GC
- gc.c
- gc_mark(), gc_sweep(), obj_free()
It's a fairly straightforward mark&sweep GC, so you'll have no problem understanding it if you know about GCs.
Parsing
Time to take a look at parse.y.
Concentrate on how the AST is built to begin with.
The YACC grammar is tricky, and when combined with yylex it makes for a fairly diffcult read, so skip this unless you specifically want to mess with the grammar.
Threading
- eval.c
- rb_thread_schedule(), rb_thread_restore_context()
the implementation of green (userspace) threads (you need to know setjmp and friends).
Keyword(s):[blog] [ruby] [frontpage] [internals] [guide] [self-study]
References:[Ruby internals: a self-study guide to the sources]