This post originated from an RSS feed registered with Ruby Buzz
by Christian Neukirchen.
Original Post: The Rightyear parsing library
Feed Title: chris blogs: Ruby stuff
Feed URL: http://chneukirchen.org/blog/category/ruby.atom
Feed Description: a weblog by christian neukirchen - Ruby stuff
Last week, I ported Monads in
Perl to Ruby just
for fun and learning. It’s amazing how one can port code without
really understanding it, for programming languages the Chinese Room
Argument certainly
works.
When I converted all the code from the page, I reread some of my
Haskell papers and got hooked by Yet Another Haskell
Tutorial which has a good chapter
on monads and their use. It also uses monads to build a
parser combinator, which
is an popular approach to parse in Haskell and other functional languages.
Rather quickly I had a direct port of something akin to
Parsec. I needed a name and
Martin DeMello proposed “rightyear”, the
japanese pronunciation of lightyear, which is about a third parsec, of
course. For testing, I wrote a parser to read something like a very
primitive kind of XML. This is an easy example of a context-sensitive free
grammar in Parsec:
xml = do{ name <- openTag
; content <- many xml
; endTag name
; return (Node name content) }
<|> xmlText
And that’s the complete Rightyear parser:
class ParseXML < Parser
def parse_xml
parsing { xml.passthru eof }.run
end
def xml
choice open_tag.bind { |name|
many(xml).bind { |content|
end_tag(name).bind { ret [name, content] }
}
}, text
end
def open_tag
char(?<) >> one_or_more(char_matching { |c| c.chr =~ /\w/ }).bind { |t|
char(?>).bindret t.join
}
end
def end_tag(t)
char(?<) >> char(?/) >> string(t) >> char(?>)
end
def text
one_or_more(char_matching { |c| c.chr =~ /\w/ }).bind { |s| ret s.join }
end
end
As you can see, Haskell’s do-notation doesn’t map very well to Ruby;
it’s rather cumbersome to write. After a bit of twiddling, I found
out that I didn’t actually need monads at all; instead I can use
exceptions or throw/catch to trackback. Now, the parser looks like
this:
class ParseXML < Rightyear::Parser
def xml
choice seq {
name = open_tag
content = many { xml }
end_tag name
[name, content]
},
seq { text }
end
def parse_xml
v = xml; eof; v
end
def open_tag
char(?<)
t = text
char(?>)
t
end
def end_tag(t)
char(?<); char(?/); string(t); char(?>)
end
def text
one_or_more { char_matching { |c| c.chr =~ /\w/ }.chr }.join
end
end
Which is a lot better for the eyes, no? It’s actually
straight-forward to write parsers in this, but you need to ensure you
don’t have left-recursion in your grammar, because that borks parser
combinators in general. The big advantage over
Bison and similar
tools is that you easily can mix in your own code and can parse full
LL(â).
Unfortunately, the trackback via exceptions still is rather slow, it
takes about 6 seconds to parse "1+#{'1+' * 6000}1" on my iBook. So
consider this as an academical exercise for now; I learned a lot about
monads and Haskell and how it sucks when you rewrite it in non-Haskell
languages. :-)
People have been doing this in Tcl/Tk and
Python, and even
C
too.