This post originated from an RSS feed registered with Python Buzz
by Ian Bicking.
Original Post: New feature for the day
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
This uses htmldiff.py to do the comparisons. Unlike some other
comparisons, htmldiff calculates the differences between HTML
documents, instead of relying on line-by-line comparisons of the
original text source. Since HTML isn't (very) whitespace sensitive,
comparisons based on line endings or other whitespace aren't really
accurate. Instead htmldiff parses the HTML into a list of tokens
-- one token for each start and end tag, and one token for each
whitespace-delimited word in the text (it essentially ignores the
nested structure of HTML and treats it as a simple stream of tokens).
This seems like a good compromise to me. Character level comparisons
ignore the structure of HTML completely, and tend to create weird
differences. Line level comparisons aren't appropriate to HTML or
narrative text. Structured comparisons like XmlDiff are too
complicated to present in a visually simple way.