How rich a subset of Ruby are you using? To which extent are you exploring the language space?
I've been performing some lexical analysis on Ruby software to provide a
first-order answer to those questions.
I'm only considering the lexical richness of a program for the time
being. Using concepts from information theory (briefly explained below),
we can obtain the "lexical complexity" of the source code.
Some results
The simplest files in stdlib
Here are the three (not totally trivial) lexically simplest files in the stdlib and
their "lexical complexities":
irb/version.rb: 0.216 bits (17 tokens of 9 different types)
shell/version.rb: 0.126 bits (17 tokens of 9 different types)
irb/lc/error.rb: 0.305 bits (93 tokens of 10 different types)
Both irb/version.rb and shell/version.rb look like
It's no wonder: they were both written by ISHITSUKA Keiju.
Other lexically simple files
English.rb: 0.616 bits (100 tokens of 4 different types)
tkclass.rb: 0.619 bits (150 tokens of 10 different types)
i686-linux/rbconfig.rb: 0.631 bits (1616 tokens of 41 different types)
English.rb is but a sequence of
alias $READABLE_NAME $SOME_UGLY_PERLISH_VAR
tkclass.rb consists mostly of assignments like
TopLevel = TkToplevel
and i686-linux/rbconfig.rb has got a lot of lines resembling
CONFIG["ruby_install_name"] = "ruby"
The "lexical complexity" above represent the amount of information carried by
a token (type) if we know the type of the token immediately preceding it. In all three
cases, there are on average less than 2 choices. This can be expected in English.rb and
tkclass.rb: after all, they only have 4 and 10 different tokens, respectively. The case of
rbconfig.rb is more interesting: there are 41 distinct token types, but the
lexical variability is very low; in other words, the code is very repetitive.
The most complex files in stdlib
tk/timer.rb: 2.411 bits (2656 tokens of 70 different types)
resolv.rb: 2.415 bits (8556 tokens of 85 different types)
optparse.rb: 2.505 bits (5903 tokens of 84 different types)
So the "lexically richest/most complex" file in the stdlib is optparse.rb.
In general, bigger files tend to use more token types. But length detracts from
the lexical variability at some point, as the code becomes a bit repetitive.
Some techniques
If you don't feel like reading the explanation and code below, here are the four
things that stand out the most:
Using the block form of Hash.new to assign unique IDs to objects:
OIDS=Hash.new{|h,k|h[k]=OIDS.size}
Another way to cache the results of a method call with an anonymous hash (could be easily metaprogrammed)
class Foofoo_cache={}define_method(:foo)do|*a|iffoo_cache.include?(a)foo_cache[a]elseret=# compute retfoo_cache[a]=retendendend
Beware of Matrix; it's slow and the interface is a bit rough at times (messy failures when requesting out-of-bound rows/columns, Vector objects are not enumerable...). (... narray)
How to compute the lexical complexity of Ruby code