The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
On the sad state of markdown processors, and getting thousandfold speed-ups.

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
On the sad state of markdown processors, and getting thousandfold speed-ups. Posted: Apr 7, 2009 5:36 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: On the sad state of markdown processors, and getting thousandfold speed-ups.
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

When I started to write the code for the latest incarnation of eigenclass, I was planning to use an existent Markdown processor to generate the HTML for the posts and comments dynamically. That'd take at most a couple lines to pipe the markup to a process and read back the HTML. I took the first Markdown implementation that came to mind, Bluecloth (written in Ruby) and ran it against a few documents. I was most underwhelmed by its speed. It was so slow it'd need over one or two seconds to process some of the entries I've written since. I benchmarked other common implementations, the original markdown (in Perl) and python-markdown, and realized that they were only marginally better. At the risk of being perceived as performance-obsessed, here's the observed performance when processing markdown's README (README.n is README concatenated n times) on a 3GHZ AMD64 box (much faster than the old server running this site):

language LoCs (approx.) README.1 time README.8 time README.32 time README.32 MEM
Bluecloth Ruby 1100 0.130s 2.16s 30s 31MB
markdown Perl 1400 0.068s 0.66s segfault segfault
python-markdown Python 1900 0.090s 0.35s 2.06s 23MB
Pandoc Haskell 900 + 450 0.068s 0.55s 2.7s 25MB

Compare to the rather acceptable performance of my own Simple_markup module in OCaml, and of discount, a C implementation I found when I had already written mine:

language LoCs (approx.) README.8 time README.32 time README.32 MEM
Simple_markup OCaml 313 + 55 12ms 43ms 3.5MB
discount C ~4500 16ms 63ms 2.8MB

(The LoC counts for Simple_markup and Pandoc are split into parsing and HTML generation.)

I didn't do any attempt to optimize Simple_markup beyond replacing a single O(n^2) call to String.nsplit with a O(n) Str.split one in order to split the input string into lines. I'm not compiling with -unsafe or -nodynlink either.

To add insult to injury, Bluecloth, markdown and python-markdown are ugly hacks that boil down to iterated regexp-based gsubs. I can see why they have a long history of bugs: it is easy for a gsub pass to interfere accidentally with another, and such regexp-based transformations are full of corner cases.

A much cleaner approach is to parse the markup into a parse tree, and then generate the (X)HTML in a separate pass. This is what Pandoc, discount and Simple_markup do.

Read more...

Read: On the sad state of markdown processors, and getting thousandfold speed-ups.

Topic: All That You Might Require Previous Topic   Next Topic Topic: Code folding in vim

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use