This post originated from an RSS feed registered with Python Buzz
by Ian Bicking.
Original Post: Persistent Persistence
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
When I first put user accounts into the wiki, I did it the simplest
way I could -- I pickled each User object to disk. Then a little
later I refactored the layout, so all the modules changed locations --
and the pickles broke. I could have tried to fix them -- maybe
editing them by hand (there weren't many), or putting files back in
place, unpickling, then reconstructing them. Either way it's a total
pain in the ass.
There weren't many accounts, so I just blew them away and stored user
information in flat files (rfc822 style). Now I don't have to worry
about this in the future, as the data is stored very transparently,
and forward and backward compatibility can be managed on an as-needed
basis.
My conclusion? Never keep anything you care about in a pickle, at any
point you should be able to blow it away. If you can't, then pickle
isn't right for you.
A corollary: do keep things in explicitly formatted text files, maybe
XML (or even YAML), maybe simpler than that (like RFC822, i.e.,
mail-header style).
The reason I like an RDBMS is because it's almost as good as a flat
text file, but with a bunch of extra features. Any good RDBMS can be
serialized to a sequence of SQL statements, again increasing
transparency. Mapping between Python and the RDBMS is explicit, which
is good -- because the data will probably live longer than the code,
so the code should adapt to the data, not the other way around. The
formality of an RDBMS -- type restrictions and all -- again safeguards
the data.
And perhaps a corollary: I'm not a big fan of using lots of stored
procedures and other related features (many uses of triggers and
views, for instance). I'm trying to protect the data from the code,
and putting code in the database compromises that. Though I'm still
trying to figure out what the right balance is, where data integrity
should (or can) be ensured, what logic is intrinsic to the database.