Pull Content From a Remote Drupal 8 Site Using Migrate and JSON API

I wanted to find a way to pull data from one Drupal 8 site to another, using JSON API to distribute data on one site, and Drupal’s Migrate with a JSON source on another site to consume it.

I wanted to find a way to pull data from one Drupal 8 site to another, using JSON API to expose data on one site, and Drupal’s Migrate with a JSON source on another site to consume it. Much of what I wanted to do was undocumented and confusing, but it worked well, once I figured it out. Nevertheless, it took me several days to get everything working, so I thought I’d write up an article to explain how I solved the problem. Hopefully, this will save someone a lot of time in the future.

I ended up using the JSON API module, along with the REST modules in Drupal Core on the source site. On the target site, I used Migrate from  Drupal Core 8.2.3 along with Migrate Plus and Migrate Tools.

Why JSON API?

Drupal 8 Core ships with two ways to export JSON data. You can access data from any entity by appending ?_format=json to its path, but that means you have to know the path ahead of time, and you’d be pulling in one entity at a time, which is not efficient.

You could also use Views to create a JSON endpoint, but it might be difficult to configure it to include all the required data, especially all the data from related content, like images, authors, and related nodes. And you’d have to create a View for every possible collection of data that you want to make available. To further complicate things, there's an outstanding bug using GET with Views REST endpoints.

JSON API provides another solution. It puts the power in the hands of the data consumer. You don’t need to know the path of every individual entity, just the general path for a entity type, and bundle. For example: /jsonapi/node/article. From that one path, the consumer can select exactly what they want to retrieve just by altering the URL. For example, you can sort and filter the articles, limit the fields that are returned to a subset, and bring along any or all related entities in the same query. Because of all that flexibility, that is the solution I decided to use for my example. (The Drupal community plans to add JSON API to Core in the future.)

There’s a series of short videos on YouTube that demonstrate many of the configuration options and parameters that are available in Drupal’s JSON API.

Prepare the Source Site

There is not much preparation needed for the source because of JSON API’s flexibility. My example is a simple Drupal 8 site with an article content type that has a body and field_image image field, the kind of thing core provides out of the box.

First, download and install the JSON API module. Then, create YAML configuration to “turn on” the JSON API. This could be done by creating a simple module that has YAML file(s) in /MODULE/config/optional. For instance, if you created a module called custom_jsonapi, a file that would expose node data might look like:

filename: /MODULE/config/optional/rest.resource.entity.node.yml:
id: entity.node
plugin_id: 'entity:node'
granularity: method
configuration:
  GET:
    supported_formats:
      - json
    supported_auth:
      - basic_auth
      - cookie
dependency:
  enforced:
    module:
      - custom_jsonapi

To expose users or taxonomy terms or comments, copy the above file, and change the name and id as necessary, like this:

filename: /MODULE/config/optional/rest.resource.entity.taxonomy_term.yml:
id: entity.taxonomy_term
plugin_id: 'entity:taxonomy_term'
granularity: method
configuration:
  GET:
    supported_formats:
      - json
    supported_auth:
      - basic_auth
      - cookie
dependency:
  enforced:
    module:
      - custom_jsonapi

That will support GET, or read-only access. If you wanted to update or post content you’d add POST or PATCH information. You could also switch out the authentication to something like OAuth, but for this article we’ll stick with the built-in basic and cookie authentication methods. If using basic authentication and the Basic Auth module isn’t already enabled, enable it.

Navigate to a URL like http://sourcesite.com/jsonapi/node/article and confirm that JSON is being output at that URL.

That's it for the source.

Prepare the Target Site

The target site should be running Drupal 8.2.3 or higher. There are changes to the way file imports work that won't work in earlier versions. It should already have a matching article content type and field_image field ready to accept the articles from the other site.

Enable the core Migrate module. Download and enable the Migrate Plus and Migrate Tools modules. Make sure to get the versions that are appropriate for the current version of core. Migrate Plus had 8.0 and 8.1 branches that only work with outdated versions of core, so currently you need version 8.2 of Migrate Plus.

To make it easier, and so I don’t forget how I got this working, I created a migration example as the Import Drupal module on Github. Download this module into your module repository. Edit the YAML files in the /config/optional  directory of that module to alter the JSON source URL so it points to the domain for the source site created in the earlier step.

It is important to note that if you alter the YAML files after you first install the module, you'll have to uninstall and then reinstall the module to get Migrate to see the YAML changes.

Tweaking the Feed Using JSON API

The primary path used for our migration is (where sourcesite.com is a valid site):

http(s)://sourcesite.com/jsonapi/node/article

This will display a JSON feed of all articles. The articles have related entities. The field_image field points to related images, and the uid/author field points to related users. To view the related images, we can alter the path as follows:

http(s)://sourcesite.com/jsonapi/node/article?include=field_image

That will add an included array to the feed that contains all the details about each of the related images. This way we won’t have to query again to get that information, it will all be available in the original feed. I created a gist with an example of what the JSON API output at this path would look like.

To include authors as well, the path would look like the following. In JSON API you can follow the related information down through as many levels as necessary:

http(s)://sourcesite.com/jsonapi/node/article?include=field_image,uid/author

Swapping out the domain in the example module may be the only change needed to the example module, and it's a good place to start. Read the JSON API module documentation to explore other changes you might want to make to that configuration to limit the fields that are returned, or sort or filter the list.

Manually test the path you end up with in your browser or with a tool like Postman to make sure you get valid JSON at that path.

Migrating From JSON

I had a lot of trouble finding any documentation about how to migrate into Drupal 8 from a JSON source. I finally found some in the Migrate Plus module. The rest I figured out from my earlier work on the original JSON Source module (now deprecated) and by trial and error. Here’s the source section of the YAML I ended up with, when migrating from another Drupal 8 site that was using JSON API.

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  urls: http://sourcesite.com/jsonapi/node/article
  ids:
    nid:
      type: integer
  item_selector: data/
  fields:
    -
      name: nid
      label: 'Nid'
      selector: /attributes/nid
    -
      name: vid
      label: 'Vid'
      selector: /attributes/vid
    -
      name: uuid
      label: 'Uuid'
      selector: /attributes/uuid
    -
      name: title
      label: 'Title'
      selector: /attributes/title
    -
      name: created
      label: 'Created'
      selector: /attributes/created
    -
      name: changed
      label: 'Changed'
      selector: /attributes/changed
    -
      name: status
      label: 'Status'
      selector: /attributes/status
    -
      name: sticky
      label: 'Sticky'
      selector: /attributes/sticky
    -
      name: promote
      label: 'Promote'
      selector: /attributes/promote
    -
      name: default_langcode
      label: 'Default Langcode'
      selector: /attributes/default_langcode
    -
      name: path
      label: 'Path'
      selector: /attributes/path
    -
      name: body
      label: 'Body'
      selector: /attributes/body
    -
      name: uid
      label: 'Uid'
      selector: /relationships/uid
    -
      name: field_image
      label: 'Field image'
      selector: /relationships/field_image



One by one, I’ll clarify some of the critical elements in the source configuration.

File-based imports, like JSON and XML use the same pattern now. The main variation is the parser, and for JSON and XML, the parser is in the Migrate Plus module:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json

The url is the place where the JSON is being served. There could be more than one URL, but in this case there is only one. Reading through multiple URLs is still pretty much untested, but I didn’t need that:

  urls: http://sourcesite.com/jsonapi/node/article

We need to identify the unique id in the feed. When pulling nodes from Drupal, it’s the nid:

  ids:
    nid:
      type: integer

We have to tell Migrate where in the feed to look to find the data we want to read. A tool like Postman (mentioned above) helps figure out how the data is configured. When the source is using JSON API, it’s an array with a key of data:

  item_selector: data/

We also need to tell Migrate what the fields are. In the JSON API, they are nested below the main item selector, so they are prefixed using an xpath pattern to find them. The following configuration lets us refer to them later by a simple name instead of the full path to the field. I think the label would only come into play if you were using a UI:

  fields:
    -
      name: nid
      label: 'Nid'
      selector: /attributes/nid

Setting up the Image Migration Process

For the simple example in the Github module we’ll just try to import nodes with their images. We’ll set the author to an existing author and ignore taxonomy. We’ll do this by creating two migrations against the JSON API endpoint, first one to pick up the related images, and then a second one to pick up the nodes.

Most fields in the image migration just need the same values they’re pulling in from the remote file, since they already have valid Drupal 8 values, but the uri value has a local URL that needs to be adjusted to point to the full path to the file source so the file can be downloaded or copied into the new Drupal site.

Recommendations for how best to migrate images have changed over time as Drupal 8 has matured. As of Drupal 8.2.3 there are two basic ways to process images, one for local images and a different one for remote images.  The process steps are different than in earlier examples I found. There is not a lot of documentation about this. I finally found a Drupal.org thread where the file import changes were added to Drupal core and did some trial and error on my migration to get it working.  

For remote images:

source:
  ...
  constants:
    source_base_path: 'http://sourcesite.com/'
process:
  filename: filename
  filemime: filemime
  status: status
  created: timestamp
  changed: timestamp
  uid: uid
  uuid: id
  source_full_path:
    plugin: concat
    delimiter: /
    source:
      - 'constants/source_base_path'
      - url
  uri:
    plugin: download
    source:
      - '@source_full_path'
      - uri
    guzzle_options:
      base_uri: 'constants/source_base_path'

For local images change it slightly:

source:
  ...
  constants:
    source_base_path: 'http://sourcesite.com/'
process:
  filename: filename
  filemime: filemime
  status: status
  created: timestamp
  changed: timestamp
  uid: uid
  uuid: id
  source_full_path:
    plugin: concat
    delimiter: /
    source:
      - 'constants/source_base_path'
      - url
  uri:
    plugin: file_copy
    source:
      - '@source_full_path'
      - uri

The above configuration works because the Drupal 8 source uri value is already in the Drupal 8 format, http://public:image.jpg. If migrating from a pre-Drupal 7 or non-Drupal source, that uri won’t exist in the source. In that case you would need to adjust the process for the uri value to something more like this:

source:
  constants:
    is_public: true
  ...
process:
  ...
  source_full_path:
    -
      plugin: concat
      delimiter: /
      source:
        - 'constants/source_base_path'
        - url
    -
      plugin: urlencode
  destination_full_path:
    plugin: file_uri
      source:
        - url
        - file_directory_path
        - temp_directory_path
        - 'constants/is_public'
  uri:
    plugin: file_copy
    source:
      - '@source_full_path'
      - '@destination_full_path'

Run the Migration

Once you have the right information in the YAML files, enable the module. On the command line, type this:

drush migrate-status

You should see two migrations available to run.  The YAML files include migration dependencies and that will force them to run in the right order. To run them, type:

drush mi --all

The first migration is import_drupal_images. This has to be run before import_drupal_articles, because field_image on each article is a reference to an image file. This image migration uses the path that includes the related image details, and just ignores the primary feed information.

The second migration is import_drupal_articles. This pulls in the article information using the same url, this time without the included images. When each article is pulled in, it is matched to the image that was pulled in previously.

You can run one migration at a time, or even just one item at a time, while testing this out:

drush migrate-import import_drupal_images --limit=1

You can rollback and try again.

drush migrate-rollback import_drupal_images

If all goes as it should, you should be able to navigate to the content list on your new site and see the content that Migrate pulled in, complete with image fields. There is more information about the Migrate API on Drupal.org.

What Next?

There are lots of other things you could do to build on this. A Drupal 8 to Drupal 8 migration is easier than many other things, since the source data is generally already in the right format for the target. If you want to migrate in users or taxonomy terms along with the nodes, you would create separate migrations for each of them that would run before the node migration. In each of them, you’d adjust the include value in the JSON API path to pull the relevant information into the feed, then update the YAML file with the necessary steps to process the related entities.

You could also try pulling content from older versions of Drupal into a Drupal 8 site. If you want to pull everything from one Drupal 6 site into a new Drupal 8 site you would just use the built in Drupal to Drupal migration capabilities, but if you want to selectively pull some items from an earlier version of Drupal into a new Drupal 8 site this technique might be useful. The JSON API module won’t work on older Drupal versions, so the source data would have to be processed differently, depending on what you use to set up the older site to serve JSON. You might need to dig into the migration code built into Drupal core for Drupal to Drupal migrations to see how Drupal 6 or Drupal 7 data had to be massaged to get it into the right format for Drupal 8.

Finally, you can adapt the above techniques to pull any kind of non-Drupal JSON data into a Drupal 8 site. You’ll just have to adjust the selectors to match the format of the data source, and do more work in the process steps to massage the values into the format that Drupal 8 expects.

The Drupal 8 Migrate module and its contributed helpers are getting more and more polished, and figuring out how to pull in content from JSON sources could be a huge benefit for many sites. If you want to help move the Migrate effort forward, you can dig into the Migrate in core initiative and issues on Drupal.org.

Update

When this article was originally written, the path for retrieving JSON API looked like:

http://sourcesite.com/api/node/article?_format=api_json

The format has been changed to this instead:

http://sourcesite.com/jsonapi/node/article

This article has been updated to reflect that change. Older versions of the JSON API module may still use the older format.

Published in:

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!