Drupal’s migration system was primarily designed for migrating data into a site from an older version of Drupal or from some other CMS. Lately I’ve been using Migrate API a lot, but not for that.
I’ve been helping one of our clients with a “website refresh”, which has entailed restructuring several content types and fields. Once I’ve made configuration changes, I’ve needed to move existing content over to the new structure. To automate that process, I’ve been using migrations.
In this article I’ll share some examples of migrations — demonstrating how useful Migrate API can be for moving content within a single site, not just between sites.
This article assumes you have a basic understanding of the Migrate API, so that you’ll know what I’m talking about when I refer to source, process, and destination plugins, and you’ll be familiar with the syntax of a migration YAML file. If not, the Migrate API Overview is a fine place to start.
Basic process
My responsibilities for this project include site-building and development, deploying changes, and inputting content. I make changes on my local dev site and commit them to the repository, then deploy to a staging site for further testing, and finally deploy to the production site.
Modules installed
Modules used in these examples:
- Migrate (core)
- Migrate Drupal (core) — for its
content_entity
source plugin - Migrate Plus — so that I could implement migrations as configuration entities without having to wrap them in a custom module, and to get some additional process plugins
- Migrate Conditions — for additional process plugins
- Migrate Source CSV — for a source plugin to import from a CSV file
- Preserve Changed Timestamp — for when I didn’t want a migration to affect existing nodes’ last-modified dates
Additionally, I use Migrate Devel on my local dev site to help with debugging.
I ran into a little gotcha with Migrate Drupal. That module depends on the Password Compatibility module for reasons unrelated to the functionality that I needed it for. Not finding an equivalent plugin elsewhere and not wanting to re-enable Password Compatibility, I ended up patching migrate_drupal.info.yml
to remove the dependency.
Workflow
When first creating a migration, I’ll typically:
- Create a new YAML file in the site’s configuration sync directory (e.g.
config/sync/migrate_plus.migration.person_associate.yml
) and make an initial attempt at writing the migration. drush config:import
to import the new YAML file.drush config:export
to add auto-generated keys and values (e.g.uuid
) to the YAML file.drush migrate:status
to check that the migration is listed.
From there, I’ll iteratively make changes to the migration and test it:
- Edit the YAML file.
drush config:import
to bring the change into Drupal’s database.drush migrate:import <migration id>
to execute the migration.- I like to combine this with
--limit=…
or--idlist=…
and--migrate-debug
to import one row at a time and print its source and destination property values. - I often get a spurious “Failed to connect to your database server” error, which I just ignore.
- I like to combine this with
- Check if the content was imported as expected. If not, debug the migration.
drush migrate:rollback <migration id>
to delete the imported content.
Later, when it’s time to deploy changes, I’ll push the code to the staging or production site and then:
drush config:import
to import configuration changes, including the restructured content types and the migrations.drush cr
to rebuild caches.drush migrate:import <migration id> --execute-dependencies
to execute the migrations.- Delete or unpublish old content using a Drush script or Views bulk operations.
Now on to the examples…
Moving content when restructuring
The examples in this article are based on the migrations I wrote for our client’s site, but simplified (so that we don’t get bogged down in details of this particular site) and anonymized.
Moving some nodes to a new content type
Originally, the site had a Person content type that encompassed people with a lot of different relationships to our client’s organization, including staff, trustees, conference speakers, and article authors. Since that content type had accumulated a bunch of rarely-used fields to try to cover all of those use cases, I decided to split it into multiple content types, one of which is called Associate.
Here’s how I wanted to map Person fields to Associate fields:
Person field | Associate field | Notes |
---|---|---|
uid |
uid |
Authored by |
created |
created |
Authored on |
changed |
changed |
Changed (last-modified date) |
title |
title |
Title |
status |
status |
Published |
field_person_portrait |
field_associate_portrait |
type: media entity reference |
field_person_website |
field_person_website |
type: link |
last word in title |
field_associate_surname |
last name as an approximation for surname |
field_person_is_trustee |
none | type: boolean |
field_person_is_staff |
none | type: boolean |
I only wanted to create an Associate node for a Person node if the node was published and if the person was a trustee or staff member, as indicated by field_person_is_trustee
and field_person_is_staff
.
Here’s the migration:
id: person_associate label: 'Creates Associate nodes for a subset of Person nodes' source: plugin: 'content_entity:node' bundle: person process: is_published: plugin: skip_on_value method: row message: 'Skipping Person that is unpublished' source: status/0/value value: 0 is_trustee_or_staff: plugin: skip_on_condition method: row message: 'Skipping Person that is not a member of Trustees or Staff' condition: plugin: and conditions: - plugin: equals source: field_person_is_trustee/0/value value: 0 - plugin: equals source: field_person_is_staff/0/value value: 0 uid: uid created: created changed: changed title: title status: status field_associate_portrait: field_person_portrait field_associate_website: field_person_website field_associate_surname: - plugin: explode source: title/0/value delimiter: ' ' - plugin: array_pop destination: plugin: 'entity:node' default_bundle: associate
- The processes for
is_published
andis_trustee_or_staff
check if we should create an Associate for the current Person or not.is_published
andis_trustee_or_staff
are pseudofields, which are only used during the migration and don’t get saved to the Associate node.
- The process for
field_associate_surname
extracts the last word from the Person’s Title. - The rest of the fields in
process
are just passed through from Person to Associate.
Changing the content type of a node reference field
Once I’d migrated Person nodes to Associate nodes, I needed to change an entity reference field that had previously held Person nodes to instead hold Associate nodes. This was a field called Presenter on a content type called Presentation.
Here’s the migration:
id: presentation_presenter label: 'Change the Presenter field from Person to Associate on Presentation nodes' source: plugin: 'content_entity:node' bundle: presentation process: nid: nid field_presentation_presenter: plugin: sub_process source: field_presentation_presenter process: target_id: plugin: migration_lookup migration: person_associate source: target_id no_stub: true changed: plugin: sub_process source: changed process: value: value preserve: plugin: default_value default_value: true destination: plugin: 'entity:node' default_bundle: presentation overwrite_properties: - field_presentation_presenter - changed migration_dependencies: required: - person_associate
- This migration doesn’t create any new nodes, only modifies existing ones.
overwrite_properties
is an option provided by theentity:node
destination plugin. Any fields that you want the migration to overwrite, you have to list here.- The process for
nid
tells the destination plugin which node to modify. - Unfortunately, with
overwrite_properties
, rolling back a migration doesn’t revert the overwritten fields to their original values.
- The process for
field_presentation_presenter
passes each Person node’s ID to themigration_lookup
plugin and gets back the node ID of the Associate that the Person was migrated to.- The
sub_process
plugin is basically a foreach loop that iterates through thefield_presentation_presenter
field’s values.
- The
- The process for
changed
, counterintuitively, prevents the Video Presentation node’s last-modified date from being changed.- The Preserve Changed Timestamp module adds the boolean
preserve
property to thechanged
field.
- The Preserve Changed Timestamp module adds the boolean
Migrating file fields to media fields
The site had some file fields that I wanted to migrate to media fields. For this I used the Migrate File Entities to Media Entities module.
This module provides a series of Drush commands to generate the Media fields that the data will be migrated into, generate the migration .yml files, and do some de-duplication. For example, if two nodes refer to copies of the same file that have been uploaded, the module will detect that the files have the same hash, and the migrations will generate just one Media item that both nodes reference.
I decided to skip the Drush command to generate the Media fields and instead create them through the admin pages, since I wanted control over the machine names. I was able to run the rest of the Drush commands, then edit the generated migrations to use my field names.
Populating newly added fields
After I migrated Person nodes to Associate nodes, there were some additional fields on the Associate nodes that needed to be filled in. I could have copied and pasted in the content manually. Instead, I wrote migrations to update the field values. This took some up-front work, but then I could just run a command to generate the content — on my local development site, on the staging site, and on the production site. And I wouldn’t have to worry about making data-entry mistakes on the production site.
Providing the field values within the migration YAML file
One new field was the Related Page field on a taxonomy vocabulary called Affiliations. Each Associate has a list of Affiliations terms. When displaying an Associate’s list of Affiliations, I needed each Affiliation to link to a related webpage — hence the Related Page field.
Here’s a migration that populates the new field:
id: affiliations_related_pages label: 'Populate the Related Page field on Affiliations terms' source: plugin: embedded_data data_rows: - id: 101 related_page: uri: 'https://example.com' - id: 102 related_page: uri: 'internal:/theme-from-flood' - id: 103 related_page: uri: 'internal:/birdhouse-in-your-soul' - id: 104 related_page: uri: 'internal:/lucky/ball-and-chain' ids: id: type: integer process: tid: id field_affiliations_related_page: related_page destination: plugin: 'entity:taxonomy_term' default_bundle: affiliations overwrite_properties: - field_affiliations_related_page
- This is an abbreviated version. The actual migration had more terms.
- In the source data,
id
is the Affiliation term’s ID. - The process for
tid
tells the destination plugin to modify the existing term that matches theid
from the source plugin. - The process for
field_affiliations_related_page
sets this link field’s value to therelated_page
value from the source plugin.
Looking up the field values from a CSV file
Another new field was a formatted-text field called Biography on the Associate content type. Our client provided a spreadsheet with biographies. I exported the spreadsheet to a CSV file and installed the Migrate Source CSV module.
Here’s the CSV:
id,title,biography 1,Particle Man,"Particle Man, Particle Man. Doing the things a particle can. What's he like? It's not important. Particle Man. Is he a dot, or is he a speck? When he's underwater does he get wet? Or does the water get him instead? Nobody knows, Particle Man." 2,Universe Man,"Universe Man, Universe Man. Size of the entire universe, man. Usually kind to smaller man. Universe Man. He's got a watch with a minute hand, millennium hand and an eon hand. And when they meet it's a happy land. Powerful man, Universe Man."
- The actual migration had more biographies.
- The
id
column is just the row number. Migrate Source CSV needs each row to have a unique identifier that contains only letters, numbers, and underscores. - I placed the CSV file in the web server’s
/tmp
directory.
Here’s the migration:
id: associate_biography label: 'Populates the Biography field on Associate nodes' source: plugin: csv path: /tmp/bios.csv header_offset: 0 ids: - id fields: - name: id - name: title - name: biography process: nid: plugin: entity_lookup source: title value_key: title entity_type: node bundle_key: type bundle: associate field_associate_biography/value: - plugin: get source: biography - plugin: callback callable: _filter_autop field_associate_biography/format: plugin: default_value default_value: basic_html changed: - plugin: entity_value source: '@nid' entity_type: node field_name: changed - plugin: sub_process process: value: value preserve: plugin: default_value default_value: true destination: plugin: 'entity:node' default_bundle: associate overwrite_properties: - field_associate_biography - changed
- The process for
nid
looks up the Associate node referred to by the current row in the CSV.- Since the CSV doesn’t include the node ID for either the Associate or the Person, I couldn’t use the
get
ormigrate_lookup
process plugins. - Instead, I used
entity_lookup
to find the node whosetitle
field matches thetitle
column in the CSV.
- Since the CSV doesn’t include the node ID for either the Associate or the Person, I couldn’t use the
- The processes for
field_associate_biography/value
andfield_associate_biography/format
set the value of the Biography field.- As a field of type “Text (formatted, long)”,
field_associate_biography
has two parts: the raw text (value
) and the text format (format
). - The process for
field_associate_biography/value
takes thebiography
column value from the CSV file and runs it through the_filter_autop
function to convert line breaks to<p>
. - The process for
field_associate_biography/format
sets the format tobasic_html
.
- As a field of type “Text (formatted, long)”,
- The process for
changed
prevents the Associate node’s last-modified date from being changed.- This is the same idea as in the Presenter field example, but the process is more complicated.
- Since the source plugin provides CSV rows rather than nodes, I couldn’t just use
source: changed
in thesub_process
plugin as I had previously. - First I had to locate the
changed
field on the node using theentity_value
plugin. Then I could run it throughsub_process
like in the previous examples.
Next time: Deploying content
I have even more examples of migrations that I’d like to share, but this article was getting long so I’ve decided to make it a two-parter. Next time I’ll talk about how I’ve used migrations to deploy new content to the production site.
Jaymie Strecker has been a software developer at Kosada for 13 years. The thing that bothers them is someone keeps moving their chair.