Summary
Database schema evolution can be made simpler with the help of tools, but what do you do if a problem is discovered after deployment and you need to roll back to an earlier schema?
Advertisement
Jim Patterson has published an article on OnJava.com, Agile Object to Database Replication with db4o. In this article he describes a mechanism to replicate a db4o object-oriented database to a relational database. He claims db4o is a good fit for agile development, because of its support for schema evolution:
In an agile environment, object models may be refactored frequently. db4o automatically adapts to a changed object model, and those changes can be replicated into the RDBMS.
db4o makes certain changes quite easy, as demonstrated in the last section of the article, Agile Development: Schema Evolution. In essence, merely changing the Java class and then using it will trigger db4o's automatic schema migration. Other tools also try to help with schema migration. Hibernate, for instance, includes a SchemaUpdate tool that migrates changes expressed in Hibernate mappings to the database. When new fields need to be added to a table, for example, or new tables need to be created, SchemaUpdate can easily do that for you.
What can be a bit trickier is when database changes are more complicated, such as initializing new fields and tables with data derived from old fields and tables. For this, hand-written update scripts are usually necessary. Nevertheless, such update scripts are usually quite straightforward to write.
The biggest risk in database schema evolution, in my experience, that makes it harder than software refactoring involves deploying the change. If you do a major refactor of your software, test it, and deploy it, but note after deployment that you've got problems, it is generally quite easy to go back to using the previous, working, version of the code. If you've changed your database, however, you often will need to perform a similar process of schema evolution, but in the reverse direction.
Because developing a reverse schema evolution path for each forward schema evolution is costly, we don't usually do it here at Artima. We try and avoid the problem primarily by testing thoroughly before deploying and hoping for the best. In addition, where possible we don't delete unused fields and tables from the database until we're confident we'll never need to roll back to a version that used them.
Do you always have a reverse script in hand before deploying a forward schema change? What other techniques do you use to manage risk in making database schema changes?
I usually have a forward script and backups of table ddl plus view/trigger/pl/sql source code (I work mostly with Oracle these days). For a small change I may skip exporting backup table ddl. I also have a complicated batch script with ctl files used by sqlldr to maintain. In other words, making changes in Oracle is hard and annoying.
There are no tools that let me rename a column name consistently across the schema, for example. There is no tool to move column aliases from view definition into the select clause. Basically relational databases (I've also used IBM's UDB, MySQL and PostgreSQL) don't have strong (or any?) refactoring support. Nor do databases provide solid native tools for versioning changes.
At some point I was thinking of making such a tool myself. The biggest problem is code formatting. Toad has a decent (but not great, imo) formatter, but Toad is neither open source nor free. The trick, I think, is to write a code formatter that outputs consistent code without all the insane nonsense that traditional DBA's often do to format their code (like aligning columns and so on... this is bad for version control if column width ever changes, etc.). Then the procedure would be to use DB to make changes, then hit a button and get an export of all the relevant stuff (minus, let's say data, or maybe including lookup data) and then due to it being consistently formatted, it would be fit for versioning.
In my opinion it's best for database itself to support versioning by providing an interface that let's one query for changes to ddl. Then I could write something like select diff from ddl_changes where schema_name = 'blah' and current_rev = HEAD and previous_rev = HEAD-1. This diff could then be used to patch the offline source and versioned.
The only other option that I can see is to not develop anything in database at all, but to use it as a backend for something else, and develop in that something else.
Personally I think it's crazy not to take full advantage of relational features provided by the database.
Ruby on rails has a very nifty feature: migrations. You can define youd database schema in pure ruby and all the subsequent changes you make are defined in a file so you can switch from one database schema version to an other. Check the screencast on migrations http://rubyonrails.com/screencasts .