Managing Migrations

Posted by Luke
on Monday, January 14

This is part of a series in which I am exploring the best practices for working together on a Rails team in advance of my acts_as_conference presentation. Earlier, I wrote about working together effectively and argued that bigger projects require better tools.

Migrations are great, but they are not without problems. Here’s some I’ve run into and how to fix or avoid them.

Migration Decay

You’ve been assigned to fix some bugs in a project your company built a while ago. You update the code and then run rake db:migrate.

rake aborted! (See full trace by running task with --trace)

D’oh.

I can this seemingly inevitable breakdown “migration decay.”

This goes against the general principles we’ve been talking about. It’s not automatic. It adds barriers when you add new members to a project, stressing the communications channels.

Note that this schema.rb definition is the authoritative source for your database schema. If you need to create the application database on another system, you should be using db:schema:load, not running all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations you’ll amass, the slower it’ll run and the greater likelihood for issues).

db/schema.rb

Idealisticly, I’m for using migrations and keeping them running from version 0 to version n. Migrations are cool because they can do it all: change your schema, load your seed data, and modify data on the server.

But as your models change, the old migrations don’t. And so it’s very likely that you’ll end up with migrations that stop working.

With enough work, you can probably keep them running—most of the time. But I’ve just been in too many situations where the migrations have failed. Worse is if they fail half-way in, because the migration number’s not incremented, but half of it has been applied. So the next run will fail, too, and then you have to fix the database manually. Bleh.

This has lead me to re-evaluate the use of schema.rb. Since schema.rb can’t load data, that means you need to separate migrations and data loading.

Re-organizing migrations

I’ve worked on a couple of projects where the existing migrations were consolidated into a single migration at the end of the development cycle.

This seems like too much trouble to me. Especially if we are giving up on migrations as the way to create the database. Who cares how many there are in that case? Plus you have to screw around with the server to re-set the schema_info table.

rake db:reset

On my latest project, I have been editing my migrations as needed and then re-creating the database with rake db:reset db:migrate.

I can only do this because it’s a small project and we haven’t deployed yet. But it does keep the migrations much cleaner and easier to understand.

Conflicting migrations

Anyone ever have this happen?

Chances are if you’ve worked on a project with more than a couple developers, you’re going to get conflicting migrations checked into your source control. And the more developers you have, the more time your whole team will waste cleaning up afterwards. Branches present another problem for migrations.

The root problem is that migrations layer another level of version control on top of your SCM. For a really insightful look at this thorny problem, I recommend reading the Django project’s Schema Evolution wiki page, particularly this part. (No, they haven’t solved it either.)

Some potential solutions:

  1. Use a SCM hook to prevent checking in conflicting migrations. There is a pre-commit hook that does this for Subversion. This wouldn’t work for branches, though.
  2. Use a plugin that extends migrations so they aren’t based (exclusively) on numbers. ELC Technologies has a plugin called Duplicate Migrations that allows this. There have also been a few other attempts, most notably Enhanced Migrations from Revolution Health and Independent Migrations by Courtenay.
  3. Write all the changes directly in schema.rb and let Auto-migrations take care of the database while your SCM keeps track of the merging.

I find the last option attractive, but I’m not sure I’d trust it for production use.

I really like the Duplicate Migrations plugin. Not only can developers concurrently change the database without problems, but development could continue on a maintenance branch without causing problems for the mainline. I plan to use it on my next big project.

I’ve also used the SCM hook approach with great success (within a single branch of development). There, the person who checks in their code “last” has to fix the problem, so it costs them some time. But it doesn’t kill everyone else’s time, too.

Your $0.02 here…

So, how do you deal with migrations? Let me know in the comments. Thanks!

Comments

Leave a response

  1. Robert FischerJanuary 14, 2008 @ 05:30 PM

    It’s kinda sick, but I’ve set up a continuous integration task that monitors the db/schema folder, and then runs up and down the migrations: for each migration (call it X), it goes from 000 to X, then back down to 001, then back up to X, then back down to 002, then back up to X, etc., etc., etc.

    Overkill, but effective overkill.

    Just make sure it runs independent of other tasks (its own DB, with an CI tool capable of doing parallel builds) because it takes a while to run.

  2. Luke FranclJanuary 14, 2008 @ 06:21 PM

    Now that’s intense. I have a CI task that drops the database and rebuilds from migration 1 but nothing like that.

  3. Tony CollenJanuary 15, 2008 @ 05:40 AM

    We’re up to migration #141, and still going strong. We rarely have conflicts with numbers, because development has slowed to a point, and we also communicate within the team when we have a pending migration. We treat it as a code review.

    I think the conflicting migrations is more of a symptom of developers not doing an “svn up” (or whatever) before checking in, but it happens.

    In the end, I think communication between the other team members is what prevents us from having these conflicts, and it’s even harder when everyone is remote to each other.

    We also make sure to check in the new schema.rb with every migration.

  4. James DevilleJanuary 15, 2008 @ 07:58 AM

    We do close to what Robert does, but we only do a drop up, down, up. Mainly because we want to be able to go both ways.

  5. Robert FischerFebruary 04, 2008 @ 09:43 PM

    Yeah, it’s intense, but computers are good at intense stuff that also happens to be brain-dead stupid. And, if your CI is running on the same server as your test DB, it’s actually pretty fast.

    The one thing I don’t like is that there’s no data test involved there: it’s just testing that things work with an empty schema. However, most of the ‘migration decay’ seems to involve data consistency issues (null values, foreign key violations, unique index violations, etc., etc.)—this shouldn’t really be a surprise considering Rails treats the databases’ data integrity tools as a second-class citizen.

    To solve this, it’d be nice if there was a set of “production-like data” that it would automatically load in when it gets up to the most recent schema, so that we can test a developer’s workspace could basically make all those traversals. I’ve heard of people using fixture scenarios or CVS flat files for stuff like this, but I’m not sure how well either of those approaches work, considering that we’re assuming a database whose schema is in flux. Maybe a Ruby script that would generate that kind of stuff would be better…I dunno…