Merging Entities During a Migration to Drupal 8

Simplify your content model and merge multiple content sources into a single entity type to reap maximum benefit from a Drupal migration .

Migrations provide an excellent opportunity to take stock of your current content model. You’re already neck deep in the underlying structures when planning for data migrations, so while you’re in there, you might as well ensure the new destination content types will serve you going forward and not present the same problems. Smooth the edges. Fill in some gaps. Get as much benefit out of the migration as you can, because you don’t want to find yourself doing another one a year from now.

This article will walk through an example of migrating part of a Drupal 7 site to Drupal 8, with an eye toward cleaning up the content model a bit. You will learn:

  • To write a custom migrate source plugin for Drupal 8 that inherits from another source plugin.
  • To take advantage of OO inheritance to pull field values from other entities with minimal code.
  • To use the Drupal 8 migrate Row object to make more values available in your migration yaml configuration.

Scenario: A music site moving from Drupal 7 to Drupal 8

Let’s say we have a large music-oriented website. It grew organically in fits and starts, so the data model resembles a haphazard field full of weeds instead of a well-trimmed garden. We want to move this Drupal 7 site to Drupal 8, and clean things up in the process, focusing first on how we store artist information.

Currently, artist information is spread out:

  • Artist taxonomy term. Contains the name of the artist and some other relevant data, like references to albums that make up their discography. It started as a taxonomy term because editors wanted to tag artists they mentioned in an article. Relevant fields:

    • field_discography: references an album content type.

       
  • Artist bio node. More detailed information about the artist, with an attached photo gallery. This content type was implemented as the site grew, so there was something more tangible for visitors to see when they clicked on an artist name. Relevant fields:

     

    • field_artist: term reference that references a single artist taxonomy term.
    • field_artist_bio_body: a formatted text field.
    • field_artist_bio_photos: a multi-value file field that references image files.
    • field_is_deceased: a boolean field to mark whether the artist is deceased or not.

Choosing the Migration’s Primary Source

With the new D8 site, we want to merge these two into a single node type. Since we are moving from one version of Drupal to another, we get to draw on some great work already completed.

First, we need to decide which entity type will be our primary source. After some analysis, we determine that we can’t use the artist_bio node because not every Artist taxonomy term is referenced by an artist_bio node. A migration based on the artist_bio node type would leave out many artists, and we can’t live with those gaps.

So the taxonomy term becomes our primary source. We won’t have an individual migration at all for the artist_bio nodes, as that data will be merged in as part of the taxonomy migration.

In addition to the migration modules included in core (migrate and migrate_drupal), we’ll also be using the migrate_plus module and migrate_tools.

Let’s create our initial migration configuration in a custom module, config/install/migrate_plus.migration.artists.yml.

id: artists
label: Artists
source:
  plugin: d7_taxonomy_term
  bundle: artist
destination:
  plugin: entity:node
  bundle: artist
process:
  title: name

  type:
    plugin: default_value
    default_value: artist

  field_discography:
    plugin: iterator
    source: field_discography
    process:
      target_id:
        plugin: migration
        migration: albums
        source: nid

This takes care of the initial taxonomy migration. As a source, we are using the default d7_taxonomy_term plugin that comes with Drupal. Likewise, for the destination, we are using the default fieldable entity plugin.

The fields we have under “process” are the fields found on the Artist term, though we are just going to hard code the node type. The field_discography assumes we have another migration that is migrating the Album content type.

This will pull in all Artist taxonomy terms and create a node for each one. Nifty. But our needs are a bit more complicated than that. We also need to look up all the artist_bio nodes that reference Artist terms and get that data. That means we need to write our own Source plugin.

Extending the Default Taxonomy Source Plugin

Let’s create a custom source plugin, that extends the d7_taxonomy_term plugin.

use Drupal\taxonomy\Plugin\migrate\source\d7\Term;
use Drupal\migrate\Row;

/**
 *
 * @MigrateSource(
 *   id = "artist"
 * )
 */
class Artist extends Term {

  /**
   * {@inheritdoc}
   */
  public function prepareRow(Row $row) {
    if (parent::prepareRow($row)) {
      $term_id = $row->getSourceProperty('tid');

      $query = $this->select('field_data_field_artist', 'fa');
      $query->join('node', 'n', 'n.nid = fa.entity_id');
      $query->condition('n.type', 'artist_bio')
        ->condition('n.status', 1)
        ->condition(fa.field_artist_tid, $term_id);

      $artist_bio = $query->fields('n', ['nid'])
        ->execute()
        ->fetchAll();

      if (isset($artist_bio[0])) {
        foreach (array_keys($this->getFields('node', 'artist_bio')) as $field) {
          $row->setSourceProperty($field, $this->getFieldValues('node', $field, $artist_bio[0]['nid']));
        }
      }

    }
  }
}

Let’s break it down. First, we see if there is an artist_bio that references the artist term we are currently migrating.

      $query = $this->select('field_data_field_artist', 'fa');
      $query->join('node', 'n', 'n.nid = fa.entity_id');
      $query->condition('n.type', 'artist_bio')
        ->condition('n.status', 1)
        ->condition(fa.field_artist_tid', $term_id);

All major D7 entity sources extend the FieldableEntity class, which gives us access to some great helper functions so we don’t have to write our own queries. We utilize them here to pull the extra data for each row.

      if (isset($artist_bio[0])) {
        foreach (array_keys($this->getFields('node', 'artist_bio')) as $field) {
          $row->setSourceProperty($field, $this->getFieldValues('node', $field, $artist_bio[0]['nid']));
        }
      }

First, if we found an artist_bio that needs to be merged, we are going to loop over all the field names of that artist_bio. We can get a list of all fields with the FieldableEntity::getFields method.

We then use the FieldableEntity::getFieldValues method to grab the values of a particular field from the artist_bio.

These field names and values are passed into the row object we are given. To do this, we use Row::setSourceProperty. We can use this method to add any arbitrary value (or set of values) to the row that we want. This has many potential uses, but for our purposes, the artist_bio field values are all we need.

Using the New Field Values in the Configuration File

We can now use the field names from the artist_bio node to finish up our migration configuration file. We add the following to our config/install/migrate_plus.migration.artists.yml:

field_photos:
    plugin: iterator
    source: field_artist_bio_photos
    process:
      target_id:
        plugin: migration
        migration: files
        source: fid

'body/value': field_artist_bio_body
'body/format':
    plugin: default_value
    default_value: plain_text

field_is_deceased: field_is_deceased

The full config file:

id: artists
label: Artists
source:
  plugin: artist
  bundle: artist
destination:
  plugin: entity:node
  bundle: artist
process:
  title: name

  type:
    plugin: default_value
    default_value: artist

  field_discography:
    plugin: iterator
    source: field_discography
    process:
      target_id:
        plugin: migration
        migration: albums
        source: nid

field_photos:
    plugin: iterator
    source: field_artist_bio_photos
    process:
      target_id:
        plugin: migration
        migration: files
        source: fid

'body/value': 'field_artist_bio_body/value'
'body/format':
    plugin: default_value
    default_value: plain_text

field_is_deceased: field_is_deceased

Notice that we changed the "source" from d7_taxonomy_term to artist so it uses our new Artist class.

Final Tip

When developing custom migrations with the Migrate Plus module, configuration is stored in the config/install of a module. This means it will only get reloaded if the module is uninstalled and then installed again. The config_devel module can help with this. It gives you a drush command to reload a module’s install configuration.

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!