Migrating Existing Data Around Destructive Rails Migrations

Ryan Stenberg, Former Developer

Article Category: #Code

Posted on

The Background

Recently, we had to restructure a complicated piece of a pretty huge Ruby on Rails application. This resulted in significant changes to the model landscape, including the removal of a model that was rendered obsolete. While this model was going to go away, we didn’t want to lose all the data we had in production, but we also didn’t want to just leave around an unused table in our database.

This kind of thing happens all the time, but we ran into an interesting challenge with this particular data migration when one of the columns was used to store an Amazon S3 URL for a CarrierWave-uploaded file. CarrierWave is pretty magical, but we found it can be difficult to work with when trying to migrate existing data around a destructive Rails migration (where we make structural changes to our database that result in a loss of data). In this post, I’d like to share our experiences and two of the approaches we experimented with.

An Example

Here’s a simple example we’ll use to demonstrate each of the data migration approaches:

Say we have a Vehicle model with scopes car_sized and truck_sized and we need to break out those two scoped Vehicles into their own classes, Car and Truck. We’re getting rid of the Vehicle model, but we don’t want to lose all that data – rather, we want to just copy them over into the appropriate target model (either Car or Truck). Let’s also say our Vehicle model belongs_to a Make and a Model and has the following attributes:

  • :year (integer)
  • :color (string)
  • :owners_guide (string) – this will represent our CarrierWave uploader and will contain our Amazon S3 URL

Approach 1: Scripts Writing Scripts!

With this approach, we’re writing a Ruby script that utilizes a lot of puts statements to write out a bunch of strings containing valid Ruby code that we’ll run on the other side of our rails migrations. It’s definitely not the prettiest, but it is pretty neat. Given our example, here’s what a basic implementation might look like:

module VehicleData
  def self.migrate
    puts "def file_from_url(url)"
    puts "  #Create a file object from S3 URL"
    puts "end\n"
    ['car', 'truck'].each do |type|
      Vehicle.send("#{type}_sized").each do |vehicle|
        puts "#{type.capitalize}.create(:make_id => #{vehicle.make_id}, :model_id => #{vehicle.model_id}, :year => #{vehicle.year}, :color => \"#{vehicle.color}\", :owners_guide => file_from_url(\"#{vehicle.owners_guide.url}\"))"
      end
    end
  end
end
VehicleData.migrate

One quick thing to note before we continue, you’ll need to surround any String-type data with \" as with the vehicle.color.

Say we named the above script test_script.rb and saved it in our project root, we’d type in rails runner test_script.rb > output_script.rb. This command will run the test_script.rb file with the current rails environment loaded. If we were to do this in production, the command would probably look something like RAILS_ENV=production bundle exec rails runner test_script.rb > output_script.rb. Once our script completes, output_script.rb should contain something like the following:

def file_from_url(url)
  #Create a file object from S3 URL
end
Car.create(:make_id => 17, :model_id => 85, :year => 2012, :color => "red", :owners_guide => file_from_url('s3.com/example1.pdf'))
Car.create(:make_id => 3, :model_id => 27, :year => 2007, :color => "black", :owners_guide => file_from_url('s3.com/example2.pdf')) 
Truck.create(:make_id => 10, :model_id => 44, :year => 2014, :color => "silver", :owners_guide => file_from_url('s3.com/example3.pdf'))

After deploying and running migrations, simply run the output_script.rb just like the previous script and your data is migrated!

This approach works just fine for numeric and string data. Dates and times require a little extra work since you need to print out a date/time in string format to the output script wrapped in an appropriate .parse call. CarrierWave-uploaded S3 files proved pretty difficult to work with in this approach. Getting a path or url is simple enough, but when you create an object on the other side of your Rails migration that also has a CarrierWave uploader, it expects a file. In our case, the S3 url’s included quickly expiring tokens that would be invalid by the time our deployment would complete. With destructive migrations, we decided this was approach was a little too brittle.

Approach 2: Two-Phase Migration - Copy + Cleanup

With this approach, we’re also writing a Ruby script to copy over our data to our new objects, but the major difference is that we’re running this after our first set of migrations where we introduce our new application code while keeping the existing models (including their relationships and key things like our CarrierWave uploader) in tact. Here, we have both the old models and data along with our new models, so it makes copying the data pretty straightforward. Additionally, there’s no risk of losing the existing data since we’re only reading it in order to copy.  If problems arise during the copying of data, we can fix our script and retry. The downside to this approach is that there’s extra work involved with keeping the models and certain things in tact during the first deployment and then having to do a separate cleanup effort in a second deployment.

Given our example from above, our script for this approach would look something like this:

module VehicleData
  def self.migrate
    ['car', 'truck'].each do |type|
      Vehicle.send("#{type}_sized").each do |vehicle|
        type.capitalize.constantize.create(
          :make_id      => vehicle.make_id,
          :model_id     => vehicle.model_id,
          :year         => vehicle.year,
          :color        => vehicle.color,
          :owners_guide => vehicle.owners_guide.file)
      end
    end
  end
end
VehicleData.migrate

Pretty similar! If our example didn’t involve the CarrierWave-uploaded S3 files, it would probably be much less work to go with Approach 1 and handle everything in a single batch of application code + migrations.  However, if your data is especially sensitive and extra precaution is required, then this is the way to go. Since we did have to consider CarrierWave-uploaded S3 files, this approach made things much more simple.

After all the existing data has been copied over to the new objects, simply deploy your cleanup code + migrations.

In a Nutshell

There are many ways to handle data migrations besides the two covered here. The most important thing is to pick the right one for the job. Take the time to think through and plan ahead. Regardless of the approach you take, always back up your data and test with copies of the real data to ensure your game-time data migrations go smoothly!

Related Articles