The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
How complex is your Ruby? Time for some lexical variability analysis

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
How complex is your Ruby? Time for some lexical variability analysis Posted: Dec 19, 2005 6:45 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: How complex is your Ruby? Time for some lexical variability analysis
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

How rich a subset of Ruby are you using? To which extent are you exploring the language space? I've been performing some lexical analysis on Ruby software to provide a first-order answer to those questions.

I'm only considering the lexical richness of a program for the time being. Using concepts from information theory (briefly explained below), we can obtain the "lexical complexity" of the source code.

Some results

The simplest files in stdlib

Here are the three (not totally trivial) lexically simplest files in the stdlib and their "lexical complexities":

  • irb/version.rb: 0.216 bits (17 tokens of 9 different types)
  • shell/version.rb: 0.126 bits (17 tokens of 9 different types)
  • irb/lc/error.rb: 0.305 bits (93 tokens of 10 different types)

Both irb/version.rb and shell/version.rb look like

module SomeModule
  @RELEASE_VERSION = "version"
  @LAST_UPDATE_DATE = "release date"
end

It's no wonder: they were both written by ISHITSUKA Keiju.

Other lexically simple files

  • English.rb: 0.616 bits (100 tokens of 4 different types)
  • tkclass.rb: 0.619 bits (150 tokens of 10 different types)
  • i686-linux/rbconfig.rb: 0.631 bits (1616 tokens of 41 different types)

English.rb is but a sequence of

 alias $READABLE_NAME $SOME_UGLY_PERLISH_VAR

tkclass.rb consists mostly of assignments like

 TopLevel = TkToplevel

and i686-linux/rbconfig.rb has got a lot of lines resembling

 CONFIG["ruby_install_name"] = "ruby"

The "lexical complexity" above represent the amount of information carried by a token (type) if we know the type of the token immediately preceding it. In all three cases, there are on average less than 2 choices. This can be expected in English.rb and tkclass.rb: after all, they only have 4 and 10 different tokens, respectively. The case of rbconfig.rb is more interesting: there are 41 distinct token types, but the lexical variability is very low; in other words, the code is very repetitive.

The most complex files in stdlib

  • tk/timer.rb: 2.411 bits (2656 tokens of 70 different types)
  • resolv.rb: 2.415 bits (8556 tokens of 85 different types)
  • optparse.rb: 2.505 bits (5903 tokens of 84 different types)

So the "lexically richest/most complex" file in the stdlib is optparse.rb. In general, bigger files tend to use more token types. But length detracts from the lexical variability at some point, as the code becomes a bit repetitive.

Some techniques

If you don't feel like reading the explanation and code below, here are the four things that stand out the most:

  • Using the block form of Hash.new to assign unique IDs to objects:
 OIDS = Hash.new{|h,k| h[k] = OIDS.size}

  • Another way to cache the results of a method call with an anonymous hash (could be easily metaprogrammed)
class Foo
  foo_cache = {}
  define_method(:foo) do |*a|
    if foo_cache.include?(a)
      foo_cache[a]
    else
      ret = # compute ret
      foo_cache[a] = ret
    end
  end
end

  • Computing a weighted average:
weights = [1, 2, 3, 2, 4, 2, 1, 3]
weighted_avg = weights.inject([0,0]) do |(sum,idx), weight|
  [sum + weight * func(idx), idx + 1]
end.first

  • Beware of Matrix; it's slow and the interface is a bit rough at times (messy failures when requesting out-of-bound rows/columns, Vector objects are not enumerable...). (... narray)

How to compute the lexical complexity of Ruby code


Read more...

Read: How complex is your Ruby? Time for some lexical variability analysis

Topic: Installing Rails should be smoother on Windows Previous Topic   Next Topic Topic: Who is online, really?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use