This is odd...
>>> d = {'test': None}
>>> d[u'test'] = 1
>>> d
{'test': 1}
>>> d = {u'test': None}
>>> d['test'] = 1
>>> d
{u'test': 1}
I guess it makes sense, but it's tricky. 'test' == u'test'; but
if you feel Unicode strings are different from byte strings (str),
then this is no help. But here's a problem with setdefaultencoding:
>>> import sys
>>> reload(sys)
>>> sys.setdefaultencoding('utf-8')
>>> s = u'\u0100'
>>> str(s)
'\xc4\x80'
>>> str(s) == s
True
>>> hash(str(s)), hash(s)
(1207774670, -1591639807)
>>> d = {s: None}
>>> d[str(s)] = 1
>>> d
{u'\u0100': None, '\xc4\x80': 1}
The strings are equal, but they don't hash equally, so the dictionary
(being a hash table) puts both in and doesn't notice their equality.
Not surprising; equality is default encoding aware (the byte string is
decoded before comparing it with the unicode string). In fact you get
UnicodeDecodeError if you compare a byte string that can't be
decoded in the default encoding to a unicode string. (I know exactly
why there's an exception there, and I understand why, and maybe I even
see how it's a good idea, but how can you not find it disturbing
that these two objects can't be safely compared when almost all other
objects, no matter how different in type, can be compared?)
Oh, but I was talking about hashes. Well, the hash algorithm for
strings apparently isn't aware of default encodings. (Just in case
this was specific to the reload(sys) hack, I also tested it with a
change to site.py). Note that hash does work for
ASCII-encodable Unicode string (i.e., hash('foo') ==
hash(u'foo')).