The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
The Rightyear parsing library

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Christian Neukirchen

Posts: 188
Nickname: chris2
Registered: Mar, 2005

Christian Neukirchen is a student from Biberach, Germany playing and hacking with Ruby.
The Rightyear parsing library Posted: Oct 29, 2005 8:49 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Christian Neukirchen.
Original Post: The Rightyear parsing library
Feed Title: chris blogs: Ruby stuff
Feed URL: http://chneukirchen.org/blog/category/ruby.atom
Feed Description: a weblog by christian neukirchen - Ruby stuff
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Christian Neukirchen
Latest Posts From chris blogs: Ruby stuff

Advertisement

Last week, I ported Monads in Perl to Ruby just for fun and learning. It’s amazing how one can port code without really understanding it, for programming languages the Chinese Room Argument certainly works.

When I converted all the code from the page, I reread some of my Haskell papers and got hooked by Yet Another Haskell Tutorial which has a good chapter on monads and their use. It also uses monads to build a parser combinator, which is an popular approach to parse in Haskell and other functional languages.

Rather quickly I had a direct port of something akin to Parsec. I needed a name and Martin DeMello proposed “rightyear”, the japanese pronunciation of lightyear, which is about a third parsec, of course. For testing, I wrote a parser to read something like a very primitive kind of XML. This is an easy example of a context-sensitive free grammar in Parsec:

xml = do{ name <- openTag
        ; content <- many xml
        ; endTag name
        ; return (Node name content) }
      <|> xmlText

And that’s the complete Rightyear parser:

class ParseXML < Parser
  def parse_xml
    parsing { xml.passthru eof }.run
  end
  def xml
    choice open_tag.bind { |name|
      many(xml).bind { |content|
        end_tag(name).bind { ret [name, content] }
      }
      }, text
  end
  def open_tag
    char(?<) >> one_or_more(char_matching { |c| c.chr =~ /\w/ }).bind { |t|
      char(?>).bindret t.join
    }
  end
  def end_tag(t)
    char(?<) >> char(?/) >> string(t) >> char(?>)
  end
  def text
    one_or_more(char_matching { |c| c.chr =~ /\w/ }).bind { |s| ret s.join }
  end
end

As you can see, Haskell’s do-notation doesn’t map very well to Ruby; it’s rather cumbersome to write. After a bit of twiddling, I found out that I didn’t actually need monads at all; instead I can use exceptions or throw/catch to trackback. Now, the parser looks like this:

class ParseXML < Rightyear::Parser
  def xml
    choice seq {
      name = open_tag
      content = many { xml }
      end_tag name
      [name, content]
    },
    seq { text }
  end
  def parse_xml
    v = xml; eof; v
  end
  def open_tag
    char(?<)
    t = text
    char(?>)
    t
  end
  def end_tag(t)
    char(?<); char(?/); string(t); char(?>)
  end
  def text
    one_or_more { char_matching { |c| c.chr =~ /\w/ }.chr }.join
  end
end

Which is a lot better for the eyes, no? It’s actually straight-forward to write parsers in this, but you need to ensure you don’t have left-recursion in your grammar, because that borks parser combinators in general. The big advantage over Bison and similar tools is that you easily can mix in your own code and can parse full LL(∞).

Unfortunately, the trackback via exceptions still is rather slow, it takes about 6 seconds to parse "1+#{'1+' * 6000}1" on my iBook. So consider this as an academical exercise for now; I learned a lot about monads and Haskell and how it sucks when you rewrite it in non-Haskell languages. :-)

People have been doing this in Tcl/Tk and Python, and even C too.

Further reading: Fast, Error Correcting Parser Combinators: A Short Tutorial (PDF) by S. Doaitse Swierstra and Pablo R. Azero Alcocer.

NP: Bob Dylan—Tonight I’ll Be Staying Here with You

Read: The Rightyear parsing library

Topic: Features done in meetings Previous Topic   Next Topic Topic: Hoodwink.d RSS Feeds

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use