Friday, June 27, 2008

Versioning Data in a database

Sean posed this question. [Link]

He's taken over the maintenance of a sizable website and needed some help in setting up a development server. No big deal (well, except for the control panels but I was able to at least point him in the right direction), and I even set up a revision control system when he threw me a curve ball—a significant portion of the site lives in a database, so is there any way to do revision control of stuff in the database?

And that's … something I've never thought of before. How do you do version control of data in a database?

That's a good question. Part of it depends on how volatile the data is and whether most of the data is pre-generated or very dynamic. I see three options:

  1. Dump the entire database, structure and data to SQL scripts and store them in the version control system. This can work if the data is generally static or intial run state values and entered by the developer.
  2. Build history tables that store all changed and previous data values with triggers. This can be run backwards to put the data back in a known state. This works fine if the metadata rarely changes and you have few tables that require this. In practice, it tends to be an enormous pain, particularly if you change the table structure regularly. Depending on the application, this may be required as a selling feature or for legal reasons.
  3. Backup the database before any big change or on a schedule. This lets you always put the database back to a previous known state. This is the easiest way, but the most limited. It's harder to do diffs. Additionally, if the database is large, it may start taking up a lot of disk space. This is less of an issue than it once was, but can still creep up on you with multi gigabyte databases.
Three choices, none attractive, and all with drawbacks. Anyone else have any ideas?

No comments:

Post a Comment