The Artima Developer Community
Sponsored Link

Python Buzz Forum
The Illusive setdefaultencoding

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Ian Bicking

Posts: 900
Nickname: ianb
Registered: Apr, 2003

Ian Bicking is a freelance programmer
The Illusive setdefaultencoding Posted: Aug 8, 2005 7:23 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Ian Bicking.
Original Post: The Illusive setdefaultencoding
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
Latest Python Buzz Posts
Latest Python Buzz Posts by Ian Bicking
Latest Posts From Ian Bicking

Advertisement

So... thinking some more about my Unicode woes, I think UTF-8 is the Right Default Encoding For Me. I think it will solve a large number of my problems.

If you set the default encoding to UTF-8, things like str(u'\u0100') actually works (and gives you the encoded version). If you concatenate the result ('\xc4\x80') to a Unicode string, the string is automatically decoded and it works perfectly. This is what I want! UTF-8, being a superset of ASCII, happens to be the encoding I'm already using in my sourcecode. I'm perfectly happy moving as many of my external data sources to UTF-8 as possible. I'll set DefaultEncoding in Apache, I'll fiddle with my database, whatever. In those cases where I can't, I'll just have to carefully decode the data, but I have to do that anyway. To the degree I can make my systems and communications consistently UTF-8, things will just get better. I really don't see a downside.

But why does Python make it SO DAMN HARD to change my encoding? I don't understand this at all. There is a function sys.setdefaultencoding, but site.py (which is loaded on Python startup) deletes the function. I feel like someone decided they were smarter than me, but I'm not sure I believe them.

From what I can tell, there's three ways to fix the default encoding:

  • Edit site.py (in the standard library) directly. Seems like a bad idea. Though maybe I'll just delete the del sys.setdefaultencoding line... anyway, site.py might appear in other places on your computer as well (e.g., /etc/pythonX.Y/site.py).
  • Create sitecustomize.py in the standard path (lib/pythonX.Y). This will apply to all processes. But I'm not sure I feel safe with effecting all Python processes. You could also save sys.defaultencoding here (under a different name) for later access.
  • Put sitecustomize.py in the current directory you run Python from. But . is not on sys.path by default (I think site.py adds it after it tries to import sitecustomize), so you have to put it in PYTHONPATH.

There's some discussion in the comments here. This post suggests running reload(sys) to restore setdefaultencoding, which is very clean to enable (none of this site crap) but reloading sys scares me a bit.

And searching about I didn't see one justification for why doing any of this is bad, just references to it being a hack, which is not very convincing. Are people claiming that there should be no default encoding? As long as we have non-Unicode strings, I find the argument less than convincing, and I think it reflects the perspective of people who take Unicode very seriously, as compared to programmers who aren't quite so concerned but just want their applications to not be broken; and the current status quo is very deeply broken.

Read: The Illusive setdefaultencoding

Topic: Django IRC logs to the rescue Previous Topic   Next Topic Topic: Python Tutorials

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use