Using Drupal migrations to modify content within a site

Drupal’s migration system was primarily designed for migrating data into a site from an older version of Drupal or from some other CMS. Lately I’ve been using Migrate API a lot, but not for that.

I’ve been helping one of our clients with a “website refresh”, which has entailed restructuring several content types and fields. Once I’ve made configuration changes, I’ve needed to move existing content over to the new structure. To automate that process, I’ve been using migrations.

In this article I’ll share some examples of migrations — demonstrating how useful Migrate API can be for moving content within a single site, not just between sites.

This article assumes you have a basic understanding of the Migrate API, so that you’ll know what I’m talking about when I refer to source, process, and destination plugins, and you’ll be familiar with the syntax of a migration YAML file. If not, the Migrate API Overview is a fine place to start.

Basic process

My responsibilities for this project include site-building and development, deploying changes, and inputting content. I make changes on my local dev site and commit them to the repository, then deploy to a staging site for further testing, and finally deploy to the production site.

Modules installed

Modules used in these examples:

  • Migrate (core)
  • Migrate Drupal (core) — for its content_entity source plugin
  • Migrate Plus — so that I could implement migrations as configuration entities without having to wrap them in a custom module, and to get some additional process plugins
  • Migrate Conditions — for additional process plugins
  • Migrate Source CSV — for a source plugin to import from a CSV file
  • Preserve Changed Timestamp — for when I didn’t want a migration to affect existing nodes’ last-modified dates

Additionally, I use Migrate Devel on my local dev site to help with debugging.

I ran into a little gotcha with Migrate Drupal. That module depends on the Password Compatibility module for reasons unrelated to the functionality that I needed it for. Not finding an equivalent plugin elsewhere and not wanting to re-enable Password Compatibility, I ended up patching migrate_drupal.info.yml to remove the dependency.

Workflow

When first creating a migration, I’ll typically:

  • Create a new YAML file in the site’s configuration sync directory (e.g. config/sync/migrate_plus.migration.person_associate.yml) and make an initial attempt at writing the migration.
  • drush config:import to import the new YAML file.
  • drush config:export to add auto-generated keys and values (e.g. uuid) to the YAML file.
  • drush migrate:status to check that the migration is listed.

From there, I’ll iteratively make changes to the migration and test it:

  • Edit the YAML file.
  • drush config:import to bring the change into Drupal’s database.
  • drush migrate:import <migration id> to execute the migration.
  • Check if the content was imported as expected. If not, debug the migration.
  • drush migrate:rollback <migration id> to delete the imported content.

Later, when it’s time to deploy changes, I’ll push the code to the staging or production site and then:

  • drush config:import to import configuration changes, including the restructured content types and the migrations.
  • drush cr to rebuild caches.
  • drush migrate:import <migration id> --execute-dependencies to execute the migrations.
  • Delete or unpublish old content using a Drush script or Views bulk operations.

Now on to the examples…

Moving content when restructuring

The examples in this article are based on the migrations I wrote for our client’s site, but simplified (so that we don’t get bogged down in details of this particular site) and anonymized.

Moving some nodes to a new content type

Originally, the site had a Person content type that encompassed people with a lot of different relationships to our client’s organization, including staff, trustees, conference speakers, and article authors. Since that content type had accumulated a bunch of rarely-used fields to try to cover all of those use cases, I decided to split it into multiple content types, one of which is called Associate.

Here’s how I wanted to map Person fields to Associate fields:

Person field Associate field Notes
uid uid Authored by
created created Authored on
changed changed Changed (last-modified date)
title title Title
status status Published
field_person_portrait field_associate_portrait type: media entity reference
field_person_website field_person_website type: link
last word in title field_associate_surname last name as an approximation for surname
field_person_is_trustee none type: boolean
field_person_is_staff none type: boolean

I only wanted to create an Associate node for a Person node if the node was published and if the person was a trustee or staff member, as indicated by field_person_is_trustee and field_person_is_staff.

Here’s the migration:

id: person_associate
label: 'Creates Associate nodes for a subset of Person nodes'
source:
  plugin: 'content_entity:node'
  bundle: person
process:
  is_published:
    plugin: skip_on_value
    method: row
    message: 'Skipping Person that is unpublished'
    source: status/0/value
    value: 0
  is_trustee_or_staff:
    plugin: skip_on_condition
    method: row
    message: 'Skipping Person that is not a member of Trustees or Staff'
    condition:
      plugin: and
      conditions:
        -
          plugin: equals
          source: field_person_is_trustee/0/value
          value: 0
        -
          plugin: equals
          source: field_person_is_staff/0/value
          value: 0
  uid: uid
  created: created
  changed: changed
  title: title
  status: status
  field_associate_portrait: field_person_portrait
  field_associate_website: field_person_website
  field_associate_surname:
    -
      plugin: explode
      source: title/0/value
      delimiter: ' '
    -
      plugin: array_pop
destination:
  plugin: 'entity:node'
  default_bundle: associate
  • The processes for is_published and is_trustee_or_staff check if we should create an Associate for the current Person or not.
    • is_published and is_trustee_or_staff are pseudofields, which are only used during the migration and don’t get saved to the Associate node.
  • The process for field_associate_surname extracts the last word from the Person’s Title.
  • The rest of the fields in process are just passed through from Person to Associate.

Changing the content type of a node reference field

Once I’d migrated Person nodes to Associate nodes, I needed to change an entity reference field that had previously held Person nodes to instead hold Associate nodes. This was a field called Presenter on a content type called Presentation.

Here’s the migration:

id: presentation_presenter
label: 'Change the Presenter field from Person to Associate on Presentation nodes'
source:
  plugin: 'content_entity:node'
  bundle: presentation
process:
  nid: nid
  field_presentation_presenter:
    plugin: sub_process
    source: field_presentation_presenter
    process:
      target_id:
        plugin: migration_lookup
        migration: person_associate
        source: target_id
        no_stub: true
  changed:
    plugin: sub_process
    source: changed
    process:
      value: value
      preserve:
        plugin: default_value
        default_value: true
destination:
  plugin: 'entity:node'
  default_bundle: presentation
  overwrite_properties:
    - field_presentation_presenter
    - changed
migration_dependencies:
  required:
    - person_associate
  • This migration doesn’t create any new nodes, only modifies existing ones.
    • overwrite_properties is an option provided by the entity:node destination plugin. Any fields that you want the migration to overwrite, you have to list here.
    • The process for nid tells the destination plugin which node to modify.
    • Unfortunately, with overwrite_properties, rolling back a migration doesn’t revert the overwritten fields to their original values.
  • The process for field_presentation_presenter passes each Person node’s ID to the migration_lookup plugin and gets back the node ID of the Associate that the Person was migrated to.
    • The sub_process plugin is basically a foreach loop that iterates through the field_presentation_presenter field’s values.
  • The process for changed, counterintuitively, prevents the Video Presentation node’s last-modified date from being changed.

Migrating file fields to media fields

The site had some file fields that I wanted to migrate to media fields. For this I used the Migrate File Entities to Media Entities module.

This module provides a series of Drush commands to generate the Media fields that the data will be migrated into, generate the migration .yml files, and do some de-duplication. For example, if two nodes refer to copies of the same file that have been uploaded, the module will detect that the files have the same hash, and the migrations will generate just one Media item that both nodes reference.

I decided to skip the Drush command to generate the Media fields and instead create them through the admin pages, since I wanted control over the machine names. I was able to run the rest of the Drush commands, then edit the generated migrations to use my field names.

Populating newly added fields

After I migrated Person nodes to Associate nodes, there were some additional fields on the Associate nodes that needed to be filled in. I could have copied and pasted in the content manually. Instead, I wrote migrations to update the field values. This took some up-front work, but then I could just run a command to generate the content — on my local development site, on the staging site, and on the production site. And I wouldn’t have to worry about making data-entry mistakes on the production site.

Providing the field values within the migration YAML file

One new field was the Related Page field on a taxonomy vocabulary called Affiliations. Each Associate has a list of Affiliations terms. When displaying an Associate’s list of Affiliations, I needed each Affiliation to link to a related webpage — hence the Related Page field.

Here’s a migration that populates the new field:

id: affiliations_related_pages
label: 'Populate the Related Page field on Affiliations terms'
source:
  plugin: embedded_data
  data_rows:
    -
      id: 101
      related_page:
        uri: 'https://example.com'
    -
      id: 102
      related_page:
        uri: 'internal:/theme-from-flood'
    -
      id: 103
      related_page:
        uri: 'internal:/birdhouse-in-your-soul'
    -
      id: 104
      related_page:
        uri: 'internal:/lucky/ball-and-chain'
  ids:
    id:
      type: integer
process:
  tid: id
  field_affiliations_related_page: related_page
destination:
  plugin: 'entity:taxonomy_term'
  default_bundle: affiliations
  overwrite_properties:
    - field_affiliations_related_page
  • This is an abbreviated version. The actual migration had more terms.
  • In the source data, id is the Affiliation term’s ID.
  • The process for tid tells the destination plugin to modify the existing term that matches the id from the source plugin.
  • The process for field_affiliations_related_page sets this link field’s value to the related_page value from the source plugin.

Looking up the field values from a CSV file

Another new field was a formatted-text field called Biography on the Associate content type. Our client provided a spreadsheet with biographies. I exported the spreadsheet to a CSV file and installed the Migrate Source CSV module.

Here’s the CSV:

id,title,biography
1,Particle Man,"Particle Man, Particle Man. Doing the things a particle can.
 
What's he like? It's not important. Particle Man.
 
Is he a dot, or is he a speck? When he's underwater does he get wet? Or does the water get him instead? Nobody knows, Particle Man."
2,Universe Man,"Universe Man, Universe Man.
 
Size of the entire universe, man. Usually kind to smaller man. Universe Man.
 
He's got a watch with a minute hand, millennium hand and an eon hand. And when they meet it's a happy land. Powerful man, Universe Man."
  • The actual migration had more biographies.
  • The id column is just the row number. Migrate Source CSV needs each row to have a unique identifier that contains only letters, numbers, and underscores.
  • I placed the CSV file in the web server’s /tmp directory.

Here’s the migration:

id: associate_biography
label: 'Populates the Biography field on Associate nodes'
source:
  plugin: csv
  path: /tmp/bios.csv
  header_offset: 0
  ids:
    - id
  fields:
    -
      name: id
    -
      name: title
    -
      name: biography
process:
  nid:
    plugin: entity_lookup
    source: title
    value_key: title
    entity_type: node
    bundle_key: type
    bundle: associate
  field_associate_biography/value:
    -
      plugin: get
      source: biography
    -
      plugin: callback
      callable: _filter_autop
  field_associate_biography/format:
    plugin: default_value
    default_value: basic_html
  changed:
    -
      plugin: entity_value
      source: '@nid'
      entity_type: node
      field_name: changed
    -
      plugin: sub_process
      process:
        value: value
        preserve:
          plugin: default_value
          default_value: true
destination:
  plugin: 'entity:node'
  default_bundle: associate
  overwrite_properties:
    - field_associate_biography
    - changed
  • The process for nid looks up the Associate node referred to by the current row in the CSV.
    • Since the CSV doesn’t include the node ID for either the Associate or the Person, I couldn’t use the get or migrate_lookup process plugins.
    • Instead, I used entity_lookup to find the node whose title field matches the title column in the CSV.
  • The processes for field_associate_biography/value and field_associate_biography/format set the value of the Biography field.
    • As a field of type “Text (formatted, long)”, field_associate_biography has two parts: the raw text (value) and the text format (format).
    • The process for field_associate_biography/value takes the biography column value from the CSV file and runs it through the _filter_autop function to convert line breaks to <p>.
    • The process for field_associate_biography/format sets the format to basic_html.
  • The process for changed prevents the Associate node’s last-modified date from being changed.
    • This is the same idea as in the Presenter field example, but the process is more complicated.
    • Since the source plugin provides CSV rows rather than nodes, I couldn’t just use source: changed in the sub_process plugin as I had previously.
    • First I had to locate the changed field on the node using the entity_value plugin. Then I could run it through sub_process like in the previous examples.

Next time: Deploying content

I have even more examples of migrations that I’d like to share, but this article was getting long so I’ve decided to make it a two-parter. Next time I’ll talk about how I’ve used migrations to deploy new content to the production site.

Jaymie Strecker has been a software developer at Kosada for 13 years. The thing that bothers them is someone keeps moving their chair.