Migrating data in production (with zero downtime)

RMAG news

When running Reform, we often faced the issue of having to migrate data in our database. We were an early-stage startup, so we built features as needed. Down the road we ran into scenarios where we realised we hadn’t chosen the correct data model.

The nature of the product (a form builder) meant we couldn’t have any downtime, because that meant our customers lost leads.

Example

What do I mean by migrating data? I mean any change to the data model, like moving data to a new column or changing a column type.

When we launched Reform, we only supported single page forms. We stored the page as a JSON object in a page column. But we quickly realised we needed to support multipage forms. That meant we had to replace page with pages and store an array of pages instead.

3 step approach

We developed a 3-step approach to make these changes with no downtime.

Preparation: Prepare the database to handle the new data model and migrate existing data
Launch: Releasing the new feature
Clean-up: Remove old data and tweak new columns

The main idea is to create a new column before reading data from it. We can migrate existing data to the new column and since we’re not reading data from this column yet, it doesn’t matter how long it takes. Once all data is migrated, we can update the app logic to read from the new column.

In practice this requires releasing a preparation PR, migrating the data and then releasing the actual feature (and clean up).

Preparation

The first release is a preparation PR. This PR contains a migration to add the new column. Because the new column is created for existing rows, it will have to be nullable for now.

Next we’ll update the app logic to save any changes both in the old column and the new column. In our example we would still save it as an object in page but we’d also generate a JSON array on the fly and save the page as the only item in the array. This means that if user saves their data it will sync to both page and pages.

// Dummy code
$form->update([
‘page’ => $page,
‘pages’ => [$page],
]);

But we still haven’t dealt with rows the user haven’t updated. For that, we’ll have to create a custom migration script. Reform is built with Laravel, so we’d create the script using a custom command and a custom queue job.

The command fetches unmigrated data from the database. In our example, that means any rows where pages is null. Only fetching unmigrated data also means we can re-run the command as many times as needed. We’d often have to re-run it if an error occurred, or if we’d batch it to only run it for N rows.

// Dummy code
$forms = Form::whereNull(‘pages’)->get();

$forms->each(function ($form) {
MigrateFormPageToPages::dispatch($form);
});

For each row, it dispatches a queue job responsible for migrating a single row. Having a separate queue job for each row has a couple of benefits:

If the job fails, we can inspect and re-run it
Processing a lot of queue jobs automatically adds a small delay so we don’t overload the database with too many requests

We triggered the command through the server console on DigitalOcean.

Launch

At this point we have a new column in our database. We’ve migrated all existing data so it exists in both columns and updates are synced to both columns.

It’d now be safe to release the new feature that prompted this migration. In our Reform example that meant launching a new UI that could handle multiple pages. On the backend we also updated the code to only update the pages column since page is obsolete.

Cleanup

Finally, we can clean up the mess we created. We can safely drop the page column since it’s not needed any more. Initially we made pages nullable but at this point we can add a not-null constraint since all rows have data.

I’m currently building a new database client for developers. It’s the tool I wish we had when building Reform. Check out a quick demo here:
https://www.youtube.com/watch?v=KAyeOBe7csc