Ruby Buzz Forum - Markdown redux, and walking the AST to generate (statically) valid XHTML

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Markdown redux, and walking the AST to generate (statically) valid XHTML

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

Markdown redux, and walking the AST to generate (statically) valid XHTML

Posted: Apr 13, 2009 6:08 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Markdown redux, and walking the AST to generate (statically) valid XHTML Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

My earlier post (about Markdown processors and how simple top-down parsers can be up to two or three orders of magnitude faster than regexp-based gsub kludges) got unexpectedly popular. Some critics accused me of deliberately bastardizing Markdown's syntax for the sake of implementation convenience and speed. In order to prove that this is not the case, I reverted the changes relative to Markdown I introduced with ease of writing in mind and measured again. The exact changes are listed below. According to git, the resulting code is in fact shorter:

$ git diff --stat -b 298c05 simple_markup.ml
 simple_markup.ml |  167 +++++++++++++++++++++++++++---------------------------
 1 files changed, 83 insertions(+), 84 deletions(-)

I think this, if anything, shows that the motivation behind my changes was not to simplify the implementation and make it shorter. As for the performance, the 330KB document I was using is processed in 50ms (vs. 63ms for discount, 2.7s for Pandoc, 2.1s for python-markdown or 30s for Bluecloth).

Transforming the AST into (statically) valid XHTML

As I said in my earlier post, I quickly recognized the advantages of going beyond mere "markup -> HTML" conversion and being able to manipulate the AST. New targets can be supported in a very compact way; for instance, (X)HTML generation takes only 40 lines of code. The code is clear and concise thanks to the use of pattern matching (it's also fast :). Moreover, I can use higher-order functions to allow for specific processing of "interesting" elements. For instance, I'm rendering pre-formatted blocks, links and images differently depending on whether the markup comes from a comment or was written by me (e.g. some extensions are allowed only in the latter case, and only the alt attribute is shown in the former).

The code below simply generates a typed HTML tree from the markup tree. I'm using Ocsigen's XHTML.M module, which employs OCaml's type system to ensure that the resulting HTML is valid. Invalid markup just doesn't compile, as you'd get a type error; e.g., if I try to express something invalid like <b><li>foo</li></b>, the type system will catch it and tell me the acceptable elements:

# b [li [pcdata "foo"]];;
     -----------------
Error: This expression has type ([> `Li ] as 'a) elt = 'a XHTML.M.elt
       but is here used with type
        [< `A | `Abbr | `Acronym | `B | `Bdo | `Big | `Br | `Button
        | `Cite | `Code | `Del | `Dfn | `Em | `I | `Img | `Input | `Ins
        | `Kbd | `Label | `Map | `Noscript | `Object | `PCDATA | `Q
        | `Samp | `Script | `Select | `Small | `Span | `Strong | `Sub
        | `Sup | `Textarea | `Tt | `Var ] elt = 'b XHTML.M.elt
       The second variant type does not allow tag(s) `Li

The compiler (or toplevel) not only tells me the HTML would not be valid (li not allowed inside b), but also lists the elements I can use there.

Here are the ~40 lines of code that generate the HTML on this site (the resulting tree is included in a larger, typed tree --- ensuring the end result is valid when the page is composed). There are no type annotations whatsoever, as OCaml is able to infer everything here (OTOH, I can get the type of any single expression with a couple keypresses in vim; you can also generate the interface with the type of all the functions with ocamlc -i):


	Web Artima.com