8 tips and tricks for making migration easier
Migrating at scale isn't for the faint of heart. Here are some of the tricks I have learned along the way, and some tips that will ensure that your next migration is easier than you expect.
Migration. It’s a word that strikes fear into the heart of many a project manager and developer. But it doesn’t need to be intimidating. Even major migrations (check out the University of Limerick case study) can go without a hitch if you plan accordingly.
The Annertech team recently migrated 50 websites from Drupal 7 to Drupal 9 for the University of Limerick, and with another 150+ websites to follow. It was no small task. It involved more than 60,000 nodes, 70 content types, 95 paragraph types, 60 vocabularies … and 37,000 individual migrations!
Here are some of the tricks I have learned along the way, and some tips that will ensure that your next migration is easier than you expect.
1. Ensure you can get back to a known good state in a single step
This tip is useful for all site building, not just migration.
I’m always experimenting, renaming things, adding and removing things. My database can very quickly get into a mess. It’s error-prone and time-consuming to try to manually revert things.
I use this single command to re-install the site from scratch, using the configuration that has been exported:
drush site:install --existing-config
As a bonus, this works really well if you switch git branches, and need different database schemas for each branch.
2. Disable the caching of migrations
Like any other plugin definitions, migrations are cached. You’re likely to be changing them a lot during development, but each time you edit a .yaml file you need to clear that cache.
You can save yourself having to do that by uncommenting this in settings.php:
/** * Disable caching for migrations. * * Uncomment the code below to only store migrations in memory and not in the * database. This makes it easier to develop custom migrations. */ # $settings['cache']['bins']['discovery_migration'] = 'cache.backend.memory';
3. Turn off search API indexing
Drupal will update the search API index every time a node changes. When developing a migration, you're going to import a lot of nodes, over and over again. That's a lot of CPU cycles that you don't need.
4. See what's going on
Sometimes when I’m working on a migration I focus on one single item. Migrate tools and migrate devel have some handy command line options.
This is a command I use a lot – to do a single thing over and over. The migrate-debug option prints out everything that’s coming in, and going out.
drush migrate:import node_article --idlist=123 --update --migrate-debug --skip-progress-bar
5. Plan your dependencies
It’s a good idea to order your migrations in such a way that when you encounter a reference field, the thing you’re referencing already exists
Yes, in Drupal there is the concept of stubs, but it doesn’t deal well with broken references. It will create content that shouldn’t be there.
Try to use dependencies rather than stubs. Consider an article node with an author field that references a person node. We would say the article migration depends on the person migration, and therefore the people should be migrated first.
Sometimes you have circular references. In the example above, a person node may have a biography field that points to an article node. In this case, you would want to use a stub.
But even here, decide which one is the MAIN reference (it's probably the author field) and still base your dependencies around that.
6. Links are hard
Particularly links with numeric IDs.
Links can point anywhere, so everything we just said about dependencies goes out of the window! Unfortunately node IDs have a habit of appearing publicly even if we try not to. Then they become a leaky abstraction.
We tend to run all our HTML text fields through a process plugin and try to clean up as much as possible. Sometimes we can establish if there’s an alias in D7 and use that.
The worst scenario is if you are migrating content with numeric links, and you combine migrated and new content. We’ve started experimenting with reserving a block of IDs for legacy content, like this:
ALTER TABLE node AUTO_INCREMENT 100000;
We tried this on our most recent migration project, and it worked beautifully. It also helped during the UAT process, as there was never any question about whether we were looking at the same item of content on the legacy and migrated sites.
7. Understand single vs multiple values
This one really confused me for a while. Especially both the single_value and multiple_value plugins have a transform function that does nothing:
public function transform($value) { return $value; }
Drupal can do one of two things with a process plugin:
- Call it once and give it all the values. These plugins operate on multiple values. A plugin to concatenate text would be an example.
- Call it lots of times, once per value. These plugins operate on a single value. A plugin to format a date would be an example.
Drupal normally knows which of those to pick, but it can get it wrong.
So the single_value and multiple_values plugins don’t actually change the data. They act as a hint to the next plugin, informing it of what sort of data we’re dealing with.
Associative arrays can sometimes need this. PHP doesn’t differentiate between numerically indexed arrays such as [0 => ..., 1 => ...], that we consider multiple values, and associative arrays like ['url' => ..., 'title' => ...], that we’d consider a single value with several parts.
8. Consider a preprocess step
Migrations work best when they have clean, simple source data. If your data is hard to work with, consider having a preprocess step first to clean it up.
One site I worked with had data scattered across CSV files. I wanted to cross reference things and found myself thinking how much easier things would be if these were a SQL database rather than CSV files.
So I did just that and put everything into a database first.
SQLite is a really nice choice for this. It’s just one file, so you don’t need any extra infrastructure for it.
Hopefully these tips and tricks will help you with your next migration. This blog here also deals with migration and contains some more information that will help
Watch the presentation
This blog was written following a presentation by Stella Power and Erik Erskine at DrupalCon Prague, called “Migration at scale. How to not… fail”. The presentation is available to view below:
How to not fail when migrating at scale
Annertech recently completed a 50-site migration for the University of Limerick. We learnt a few new tricks along the way, and even wrote a new module to help pull data from the old site into a spreadsheet.
This blog has all the advice and tips you'll need to ensure that your next migration is a success.
Migration looming?
If you need help with a tricky project, drop us a line and we’ll see how we can help.
Erik Erskine Senior Backend Developer
Erik Erskine is a senior backend developer. He often takes the lead in our more challenging migrations, often for large third-level colleges and universities.