This post originated from an RSS feed registered with Java Buzz
by Alan Williamson.
Original Post: Working with WIKI markup
Feed Title: Technical @ alan.blog-city.com
Feed URL: http://www.ibm.com/us/en/
Feed Description: (Technical) the rants of a java developer trapped in the body of a java developer!
I am involved in a project at the moment that requires me to store and manage data in XHTML (or DocBook). My 'users' will be writing articles online via the web browser. I need to make sure the data they are entering is valid XHTML as this allows me to apply a variety of transformations on the data to produce a wide range of outputs. However, my user base aren't too familiar with HTML or even bothered about closing off tags etc. Therefore one has to come up with a solution to let the server do a lot of the work.
I have attempted to use online editors (fckeditor/htmlArea) but they do not enforce valid XHTML and the results they produce can sometimes be a right old mess of tags (try editing the source of a post-fckeditor saved text). Ironically one of the common suggestions that have come from my users is their desire to use a WIKI type of input. They are comfortable with this and it does solve a lot of UI problems.
However I am finding it difficult to find tools that will actually take WIKI Markup and transform it to XHTML. I have looked at Radeox which gave me hope, but fails to warn me when its invalid markup. It will happily report to the commons-logging library that something has gone wrong, but happily push on and render the text. I have also noted that the output isn't true XHTML so I am really no further forward in that respect. If it could at least give me back those warnings then I could report this back to the user for them to fix. At the moment all I see Radeox doing is being a fancy wrapper to regular expressions!
I have also looked at JTidy which is a great piece of software. Infact it does produce valid XHTML, but doesn't solve my problem of restricting just the most basic of HTML tags. I really like the idea of WIKI type of input; its clean and easy and makes writing very fast.
Anyone got any suggestions as to how one could proceed? I am solely tempted to modifying the Radeox code base to filter up the exceptions to me and then take its output and run it through JTidy to produce the final XHTML output.