This post originated from an RSS feed registered with Python Buzz
by Ben Last.
Original Post: u"In The Beginning"
Feed Title: The Law Of Unintended Consequences
Feed URL: http://benlast.livejournal.com/data/rss
Feed Description: The Law Of Unintended Consequences
I have to blog this; the story of where UTF-8 came from. Linked from Joel Spolsky's excellent article on Unicode, which has been in Favorites\Unicode like, forever (well, since late 2003).
On the subject of Python and Unicode; I find that none of the IDEs that don't cost real money can handle Unicode paste on Windows XP. Boa Constrictor, IDLE, Pythonwin, PyCrust etc - all fail when submitted to the Москва test (and if that word appears as a set of question marks or boxes, then some feed that doesn't grok Unicode has mangled this posting). The test is simple; copy the word Москва from Notepad (which handles UTF8 files very nicely, thank you) and paste into Python environment of your choice, in a command such as:
a = 'Москва'
or
a = u'Москва'
Several fail at this point, replacing the Unicode pasted string with question marks. Those that pass then get subjected to Part 2, in which I grill them mercilessly with:
print a
None have so far succeeded in printing the string as it should be shown. In the case of Pythonwin I've tracked through the source looking for how pasting is handled and become mired in a swamp of win32 integration, locale and pywin32 interactions.
Feel free to try different settings of the default encoding in site.py and if you get it to work, please, post it somewhere!
Let me not be misunderstood here; Python's Unicode support is excellent. The mismatch appears to be where the Python rubber meets the win32 road.