Ruby Buzz Forum - How complex is your Ruby? Time for some lexical variability analysis

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
How complex is your Ruby? Time for some lexical variability analysis

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

How complex is your Ruby? Time for some lexical variability analysis

Posted: Dec 19, 2005 6:45 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: How complex is your Ruby? Time for some lexical variability analysis Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

How rich a subset of Ruby are you using? To which extent are you exploring the language space? I've been performing some lexical analysis on Ruby software to provide a first-order answer to those questions.

I'm only considering the lexical richness of a program for the time being. Using concepts from information theory (briefly explained below), we can obtain the "lexical complexity" of the source code.

Some results

The simplest files in stdlib

Here are the three (not totally trivial) lexically simplest files in the stdlib and their "lexical complexities":

irb/version.rb: 0.216 bits (17 tokens of 9 different types)
shell/version.rb: 0.126 bits (17 tokens of 9 different types)
irb/lc/error.rb: 0.305 bits (93 tokens of 10 different types)

Both irb/version.rb and shell/version.rb look like

module SomeModule
  @RELEASE_VERSION = "version"
  @LAST_UPDATE_DATE = "release date"
end

It's no wonder: they were both written by ISHITSUKA Keiju.

Other lexically simple files

English.rb: 0.616 bits (100 tokens of 4 different types)
tkclass.rb: 0.619 bits (150 tokens of 10 different types)
i686-linux/rbconfig.rb: 0.631 bits (1616 tokens of 41 different types)

English.rb is but a sequence of

 alias $READABLE_NAME $SOME_UGLY_PERLISH_VAR

tkclass.rb consists mostly of assignments like

 TopLevel = TkToplevel

and i686-linux/rbconfig.rb has got a lot of lines resembling

 CONFIG["ruby_install_name"] = "ruby"

The "lexical complexity" above represent the amount of information carried by a token (type) if we know the type of the token immediately preceding it. In all three cases, there are on average less than 2 choices. This can be expected in English.rb and tkclass.rb: after all, they only have 4 and 10 different tokens, respectively. The case of rbconfig.rb is more interesting: there are 41 distinct token types, but the lexical variability is very low; in other words, the code is very repetitive.

The most complex files in stdlib

tk/timer.rb: 2.411 bits (2656 tokens of 70 different types)
resolv.rb: 2.415 bits (8556 tokens of 85 different types)
optparse.rb: 2.505 bits (5903 tokens of 84 different types)

So the "lexically richest/most complex" file in the stdlib is optparse.rb. In general, bigger files tend to use more token types. But length detracts from the lexical variability at some point, as the code becomes a bit repetitive.

Some techniques

If you don't feel like reading the explanation and code below, here are the four things that stand out the most:

Using the block form of Hash.new to assign unique IDs to objects:

 OIDS = Hash.new{|h,k| h[k] = OIDS.size}

Another way to cache the results of a method call with an anonymous hash (could be easily metaprogrammed)

class Foo
  foo_cache = {}
  define_method(:foo) do |*a|
    if foo_cache.include?(a)
      foo_cache[a]
    else
      ret = # compute ret
      foo_cache[a] = ret
    end
  end
end

Computing a weighted average:

weights = [1, 2, 3, 2, 4, 2, 1, 3]
weighted_avg = weights.inject([0,0]) do |(sum,idx), weight|
  [sum + weight * func(idx), idx + 1]
end.first

Beware of Matrix; it's slow and the interface is a bit rough at times (messy failures when requesting out-of-bound rows/columns, Vector objects are not enumerable...). (... narray)

How to compute the lexical complexity of Ruby code

Previous Topic

Next Topic


	Web Artima.com