John Coggeshall has just released a beta of a PHP5 wrapper for Tidy.
In case you've been living under a rock, Tidy is a program (and now, C library) that turns tag-soup into well-formed, pretty-printed XML.
Some possible uses:
Cleaning up weblog comments (and forum pages, when standards-compliant forums start showing up) so ...