The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
A Survey of Gem Naming

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Matt Bauer

Posts: 48
Nickname: bauer
Registered: Apr, 2007

Matt Bauer owns Mosquito Mole Multiworks - a Rails Hosting and Consulting Company
A Survey of Gem Naming Posted: May 14, 2007 5:53 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Matt Bauer.
Original Post: A Survey of Gem Naming
Feed Title: blogmmmultiworks
Feed URL: http://blog.mmmultiworks.com/feed/rss.xml
Feed Description: Thoughts on Ruby, hosting and more
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Matt Bauer
Latest Posts From blogmmmultiworks

Advertisement

I've been working to build a gems data warehouse based on the Rubyforge mirror downloads. A beta version should be done by RailsConf. One of the first steps in building a data warehouse is the Extraction, Transform and Load (ETL) of data. The source for the gems data warehouse are the Apache logs. They typically look like:

216.243.185.119 - - [04/May/2007:16:57:02 -0500]
"GET /gems/hpricot-0.5.gem
HTTP/1.1" 200 232448 "-" "RubyGems/0.9.2"

From this it's very easy to tell we have a download of whys Hpricot version 0.5 gem. It's not always that easy though. Some gems have an os designation like:

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/mysql-2.7.3-mswin32.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

In this case we have a download of the MySQL version 2.7.3 gem for Windows. The problem is not everyone designated Windows as mswin32. For example:

216.243.185.119 - - [04/May/2007:16:44:24 -0500]
"GET /gems/sqlite3-ruby-1.2.1-mswin.gem
HTTP/1.1" 200 108032 "-" "RubyGems/0.9.0"

Starting to see the problem? It gets better though. Sometimes there is an architecture designation like:

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/rubysspi-1.0.5-i386-mswin32.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

It's not just the Windows gems that do this though. There are linux gems as well.

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/fireruby-0.3.2-i586-linux.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

So far we've seen gems named with an os and architecture designation. There are also gems named with os version like:

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/wxruby2-preview-0.0.38-powerpc-darwin7.9.0.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

The above all follow a nice pattern and writing a regex for it isn't too bad. They all use a hyphen to separate import parts and follow a nice pattern of name-version-arch-os[version]. Then there's the following without a consistent use of hyphens.

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/crypt-isaac_0.9.1.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

Still it's quite easy to write a regex to parse it. Then there's the following which doesn't follow the pattern.

216.243.185.119 - - [04/May/2007:16:56:26 -0500]
"GET /gems/POpen4-0.1.1-win32-1.8.4-VC6.gem
HTTP/1.1" 200 56832 "-" "RubyGems/0.9.1"

Notice first that it uses win32 and not mswin32 as the os designation. Second it includes the Ruby version with release tag which really plays havoc on the regex. So how do you write a nice regex to parse all this? You do it like this:

/[GET|POST|PUT|DELETE] \/gems\/(POpen4|.*)[-_]([0-9\.]+)-{0,1}
(i386|i486|i586|i686|powerpc){0,1}-{0,1}(win32|mswin32|mswin|linux|darwin){0,1}-{0,1}
(.*)\.(gem|tgz|zip) HTTP\/1\.[0|1]/

Now I could make this more concise and feel free to leave your solution in the comments. I just chose not to as this one is at least readable. Some interesting things to note about this regex. There is that lack of an amd64 or sparc designation since I don't see any gems specifically designated for that architecture. In the same breath, there are no gems with the i486 or i686 designation. I just left them on in case it happens one day. There are also no gem designated for solaris or any *BSD (other than darwin).

Now some of you smart kids in the audience might say, "Use the gem spec to figure this all out." To which my response would be, "Yah right!" I looked at the gem specs initially to do all this work but all I found was despair. Nearly all the released gems have incomplete gem specs which makes parsing the filename the most effective and correct way to extract information.

One final word, I'm not for enforcing a naming scheme via the api since there are always reasons to go outside the scheme. I instead prefer the community self regulates itself. I don't blame people for not filling out their gem specs; I don't. There's no reason to since no process uses most of the information in it. Once someone builds something to make use of the gem spec people will update their gem specs on their own. Just so happens I'm building such a beast. Stay tuned or find me a RailsConf for a demo.

Read: A Survey of Gem Naming

Topic: Ruby The Smalltalk Way #1 - Fundamentals Previous Topic   Next Topic Topic: IBM_DB2 or IBM_DB?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use