Managing Rails schema and data migrations without losing your mind

As time goes by, you’ll need to change the data model for your software. Rails makes this extremely easy to do, but leaves much of the day-to-day process to you to sort out (the principle of Sharp Knifes).

Once you’re past the initial scaffolding stage of a project, you’ll have questions about how to manage your application schema, especially when working on a team.

Adding rules to your process codifies good practices; a boring checklist isn’t sexy, but it will get the job done and free you up for more valuable work.

Unless you have strong, context-specific reasons not to, follow these rules:

Use migrations only for schema changes
Use one-off scripts to seed/import data
Aggressively prune scripts and sync environments

Migrations for only schema changes

The ActiveRecord Migration DSL is one of the nicest toolkits in all of web programming: it’s easy to write migrations, test them locally, and the design of how versions are handled is very robust.

Since migration files are Ruby classes, you can put any kind of arbitrary code in them. You could use the migrations DSL, execute raw SQL commands, or create records using your application code.

But you will find plenty of advice that you should NOT reference ActiveRecord models during migrations.

Over the life of your Ruby on Rails application, your app’s models will change dramatically, but according to the Rails guides, your migrations shouldn’t:

“In general, editing existing migrations is not a good idea. You will be creating extra work for yourself and your co-workers and cause major headaches if the existing version of the migration has already been run on production machines. Instead, you should write a new migration that performs the changes you require.”

That means that if your migrations reference the ActiveRecord model objects you’ve defined in app/models, your old migrations are likely to break. That’s not good.

This advice comes from the README for good-migrations – a gem that prevents loading your models in migrations so it becomes impossible to get yourself into a mess. If you can’t load your models, you can’t accidentally reference them in migrations.

As your codebase evolves, Model.create or Model.update calls sprinkled throughout old migrations makes it hard to run the migrations against a completely new database without errors.

You can work around this by redefining models inside the scope of your migration or be being very diligent about refactoring old migrations if you make breaking changes, but the bottom-line is: it’s not worth it.

Getting your migrations into a broken state is a huge mess – everyone on your team might end up with a slightly different local database or your environments get out of sync based on what order migrations have been run. If you don’t do it correctly, your CI server won’t be able to rebuild a test database from scratch.

Fix the problem by avoiding it all together: only do schema migrations. If you see code other than the migrations DSL in a migration, don’t merge it!

One-off scripts for seeding/importing data

In a data-heavy application, it’s common to import or modify a bunch of data right after a schema change. For this situation, write one-off scripts and execute them with rails runner. I recommend putting these scripts in a db/script folder.

Write a script to create the data you need (seed, import from file, bulk update, etc)
Test it against your local database
Deploy to your other environments and run the script
Remove the script from source control once it’s been run everywhere (more on this in the next section)

Here’s an example:

class AddArticleCategories < ActiveRecord::Migration[6.0]
  def change
    create_table :article_categories do |t|
      t.string :name, unique: true, null: false
    end
  end
end

# Seed initial categories
rails = ArticleCategory.create!(name: "rails")
js = ArticleCategory.create!(name: "javascript")
git = ArticleCategory.create!(name: "git")

# Back-fill categories
Article.each do |a|
  if a.title.downcase.includes?("rails")
    a.update(category: rails)
  end
end

puts "Done!"

# Run locally
bin/rails runner db/script/seed_categories.rb

# Push script to staging and run migration
heroku run --app my-app-test bin/rails runner db/script/seed_categories.rb

# Push script to production and run migration
heroku run --app my-app bin/rails runner db/script/seed_categories.rb

Now that the script has been run on production, you can delete it from the project.

Aggressively prune and sync environments

With this approach, we treat schema migrations as durable and data scripts as disposable. Once the script has been run on production, it doesn’t need to exist anymore.

If you need to reference it again, use your git history, not a huge folder full of potentially out-dated and broken scripts. And in some cases, it is harmful to keep around old scripts that someone might accidentally re-runs a few weeks later.

But what if someone on your team was out sick and didn’t get the script run locally and now it’s already been removed? Start treating your database environments like a one-way street.

Production is the gold standard: whatever is on production is the absolute source of truth
You can pull down production to a staging/test environment
You can pull down staging to local development databases

Data can safely flow from Production -> Staging -> Development but never the other way.

If someone needs to get up-to-date, have them clone the staging database to their local database. Periodically clone the production database to the staging environment as part of your release process.

If you’re on Heroku, use parity to do this. It’s a super convenient way to add one-line “copy this environment to that environment” functionality to your project.

Optional: clean up old migrations

Depending on how complex your application is, you may find yourself drowning in migration files. There is no harm in having them in your db/migrations folder, but if you want, you can delete them.

Just make sure you switch from a db:create/db:migrate approach to a db:schema:load approach on CI/test.

As with data scripts, if you need to refer back to some old migrations you have source control for that.

When to deviate

You’ll need to reach for a different workflow if:

You need zero downtime migrations
You have sensitive data that prevents syncing environments
You have have a complicated multiple database setup

…and probably several other circumstances. As they say: “but of course there are obvious exceptions…”

If this is the case, you should deviate from these rules!

But think really hard and challenge if you actually have those requirements before adding extra complexity to your process. You can always add capabilities later as needed, but it’s much harder to take them away once you become dependant on them.

Summary

Especially when it comes to touching production data, being boring is a virtue. We don’t want fancy setups because if (let’s be honest: when…) they break, we are entering a high stress situation.

Imagine you are trying to do a deploy and the migrations fail – or the schema changes work, but there is an exception seeding some data. Now you have a problem. Do you rollback? SSH in and re-run the script? Start flailing around in the Rails console? Break out into a cold sweat?

If you’re ever spent an hour trying to help diagnosis why a teammate’s local database isn’t matching your own or why the schema.rb file seems to change every time you run migrations, you would benefit from this strategy.

If you follow these rules:

Use migrations only for schema changes
Use one-off scripts to seed/import data
Aggressively prune scripts and sync environments

You’ll be well positioned for reliable database migrations with clear guardrails around how you (and your team) should approach changing data.

Was this article valuable? Subscribe to the low-volume, high-signal newsletter. No spam. All killer, no filler.