Test Driven Development for Decoupled Drupal

We replaced a back-end system consisting of Drupal 7, Java, and MongoDB with a single Drupal 8 site. Our team got there - but it took some learning along the way.

In early Spring of 2019, I had a moment of clarity. Not only was I going to be working on a decoupled Drupal project, but we were going to be fulfilling one of the promises of a decoupled architecture. Our team was going to be replacing a back-end system consisting of Drupal 7, Java, and MongoDB with a single Drupal 8 site. The React website at universalkids.com (and the only consumer of this API, now and in the future) was only going to change by changing its base API URL.

Theoretically, if everything went according to plan, we should be able to swap the API backend, and the React consumer wouldn’t see a single difference. And if we know there are no changes in the expected responses, we should be able to write a test suite that proves this - all without requiring manual QA.

Before I wrote a line of code, I envisioned a development workflow like the following:

  1. A test suite of HTTP requests and responses is created by capturing all requests made between the React app and the existing API backend.
  2. A database is migrated from that point in time (so content hasn’t changed) from Drupal 7 to Drupal 8.
  3. When a pull request is open, a test case loops through all captured requests and compares their responses with those from the Drupal 8 API.
  4. Initially, every single test would fail (since the Drupal 8 site hadn’t been built yet). As we added more API features, tests would start to pass until they all did.

Our team got there - but it took some learning along the way.

The Tooling

By the end of the project, we had used the following tools to build and run our test suite:

Capturing Requests to Create a Test Suite

The first step was to ensure that we could actually capture the API requests in the first place. If some system limitation kept us from capturing the requests, there’d be no point in continuing further.

My first goal was to find a transparent HTTP proxy that would export requests in a common format. A bespoke format would be fine too, but I wanted the flexibility to reuse the capture requests in other tools, or to change tools in the stack without major rewrites. This led me to mitmproxy which has become one of my new favorite tools. With it, I could:

  • Point the React app running on my local to http://localhost:8080 for API requests via it’s .env file.
  • Run mitmproxy --mode reverse:https://api-backend.example.com to proxy requests to that backend.
  • Load the home page in the React app, and see the requests being recorded by mitmproxy.
  • Export requests as .har files.

mitmproxy is a CLI tool (with a text-based UI) that allows for inspection, modification, and replay of requests. It also comes with mitmdump, which is a non-interactive tool for capturing requests. If I copied the har_dump.py to my local directory and ran mitmdump -s ./har_dump.py --set hardump=./dump.har --mode reverse:https://api-backend.example.comafter CTRL-C'ing the dump process I was given a single JSON export of all requests. Note this command overwrites any existing file without confirmation.

To simplify running this command, I added it to the scripts key in the React app’s package.json so I could run npm run mitmdump any time I needed.

API keys and tokens

One thing to note about this is that requests and responses are captured as is - including any API keys or credentials. The existing API used a static key, which we didn’t want to commit to git. I would do a find and replace after each dump to replace the key with REDACTED. In retrospect, I should have modified the har_dump.py script to do this automatically. Next time!

Improvement 1: Using a request cache

The first thing I noticed when looking at the captured requests was that there were many duplicate requests and responses, even during the same page load. This dramatically increased the size of the archive and would have slowed down tests too. It also slowed down capturing requests, since they went over the internet from my laptop to the API backend.

I got lucky in this case; the existing React app already had support for HTTP response caching through the superagent-cache package. I just had to configure it and fix one minor bug in the app. The cache library supports several cache libraries, but due to an app bug, only memcache worked. Since our goal was to not change the React app, I used the docker memcached image to run memcache on my local.

$ docker run -d --name memcached -p 11211:11211 memcached:latest

Installing memcache with homebrew or your local package manager would have been fine too.

To make sure I was hitting the cache, I used the watch tool (installed with homebrew) to run:

$ watch 'echo STATS | nc 11211'

As long as get_hits started to increase, I knew that HTTP requests were being serviced by the cache, and not making it to mitmdump to be recorded. If I made a change and wanted to create a new dump, I could clear the cache with docker restart memcached.

Crawling The Site

What I had was great, but it involved far too much clicking. The next step was to automate crawling the site, so I could be sure I would get a representative set of API requests.

I knew the site had support for server-side rendering, but I wasn’t willing to trust that every HTTP request would be triggered by server requests. I wanted to crawl the site with a full browser and a complete JS stack. A few searches led me to Headless Chrome Crawler.

This aptly named project does exactly what you’d think: it lets you crawl a site, with Chromium, without spawning browser windows, in parallel threads. It made sense to use a JavaScript library since this code would live in the React application. After including the library with npm --dev --save-dev headless-chrome-crawler, I created the following based on their example:

const HCCrawler = require('headless-chrome-crawler');
const chalk = require('chalk');

// For a quicker crawl that only crawls the sitemap (and does not execute any
// XHRs, and assumes a single sitemap page) run:
// $ wget --quiet https://www.example.com/sitemap.xml --output-document - | egrep -o "https?://[^<]+" | tail -n +2 | xargs -P20 -n1 wget -i - --output-document /dev/null -q)
(async () => {
  let count = 0;
  const crawler = await HCCrawler.launch({
    maxDepth: 3,
    followSitemapXml: true,
    allowedDomains: ['localhost'],
    obeyRobotsTxt: false,
    // Function to be called with evaluated results from browsers
    onSuccess: (result => {
      let message;
      if (result.response.ok) {
        const status = chalk.green(`HTTP ${result.response.status}`);
        message = `${status} from ${result.response.url}`;
      } else {
        const status = chalk.inverse(chalk.red(`HTTP ${result.response.status}`));
        message = `${status} from ${result.response.url}`;

      crawler.queueSize().then(size => {
        // As the queue doesn't know about the duplicate request handling,
        // count + size will decrease sometimes.
        message += ` [${count}/${count + size}]`;

  // Queue a request
  await crawler.queue('http://localhost:3000/');
  await crawler.onIdle(); // Resolved when no queue is left
  await crawler.close(); // Close the crawler

I placed this in test/crawler, and added a script to package.json so I could run it with npm run crawl. I usually wouldn’t let this command complete as eventually, the crawl would only be hitting cached requests. Early in development, we only used the first 200 requests or so, since we knew those covered bootstrapping the site and loading the home and landing pages. As we got further in development and had more migrated data and APIs implemented, I would let the command run longer to generate a bigger suite of data.

🎉 ! I had a HAR I could use as a part of automated tests. It was time to start writing them.

Existing Site Tests

Normal Drupal functional tests work by installing a new Drupal site from scratch. This is great in most cases, as it ensures tests are reproducible. However, it doesn’t work if you want to test against specific content. I found Drupal Test Traits, which provided exactly what I was looking for - a way to define a new suite of “Existing Site” tests that would run against your normal Drupal database. While it has support for full browsers, we were testing APIs so we only needed the basic setup.

The initial implementation of our ApiTest class looked something like the following:

  • A data provider that would provide one test case for each request and response pair, based on loading the HAR file.
  • A testApi method that would accept a request and response.
  • Code inside of testApi to replace the production API hostname with the hostname of the site instance being tested.
$request = $this->container->get('request_stack')->getCurrentRequest();
$expected_response_body = str_replace('http://api-production.example.com', $request->getSchemeAndHttpHost(), $fixture->getBody());
$response = $this->drupalGet($fixture->getPath(), [], $fixture->getRequestHeaders());
$this->assertEquals($response, $expected_response_body);

For the first commit, every test failed, but we didn’t want to mark pull requests as failed. We added an || true after the phpunit command to suppress the exit code.

$ ./vendor/bin/phpunit --debug --verbose \
  --testsuite existing-site \
  --log-junit artifacts/phpunit/phpunit.xml || true

This was enough to bootstrap development, but our final implementation ended up looking quite different.

Improvement 2: Regression Checks by Test Counts

The above test suite, even in a completely broken state, was a huge help in early development. As we started to implement the API, we could run tests locally and see the diff from what we returned and what was expected. In fact, we used this to file tickets to help break down our initial “implement this API” ticket.

One day, we got the very first test passing, and the question arose; since we were eating the phpunit exit code, how would we detect regressions?

Our initial approach was to keep track of a count of passing tests. Early on, we didn’t really care about regressions as long as we were getting more tests passing. We added this snippet to our RoboFile.php PHPUnit task to count the test failures, and only fail if we had more than the expected number of failures (or any errors).

$collection->addCode(function () {
  $report = simplexml_load_file('artifacts/phpunit/phpunit.xml', "SimpleXMLElement", LIBXML_PARSEHUGE);
  $failures = (int)$report->testsuite['failures'];
  $errors = (int)$report->testsuite['errors'];
  if ($errors) {
    $this->io()->error(sprintf("%d error(s) during tests.", $errors));
  // Reduce this from 539 as we get passing tests.
  if ($failures > static::ALLOWED_API_TEST_FAILURES) {
    $this->io()->error(sprintf("%d failures during tests out of %d allowed.", $failures, static::ALLOWED_API_TEST_FAILURES));
  return (int) ($errors || $failures > static::ALLOWED_API_TEST_FAILURES);
}, 'count_phpunit_failures');

This worked well for a month or two, but we ended up reevaluating this approach later.

Improvement 3: ParaTest

We ran into an unexpected problem as we implemented more of the API. It turns out that 404 responses are really fast. As we implemented more of the API, our test suite became slower and slower, to the point where it was taking many minutes to complete. Running tests on my local while running htop quickly showed the underutilization of even a single CPU core. In the real world, we expect multiple requests in parallel, so why not in tests too?

I’ve always wanted to use ParaTest, but in the past, it didn’t work for Drupal functional tests. I installed it with composer, and it worked on the first try!

$ ./vendor/bin/paratest --test-suite existing-site  --functional --max-batch-size 20

Running htop showed all 4 CPU cores completely maxed out, and was around five times faster than running a single test thread at a time. In an uncached scenario (remember, we’re hitting the existing site including caches for tests), I was able to run the whole suite locally in around 2 minutes. This also improved test performance on CircleCI of our ~600 tests by a similar amount, allowing all of our jobs to complete in around three and a half minutes.

One side effect of using ParaTest is that it helped us discover race conditions in our code. For example, we added a caching layer to improve the performance of collection requests which could load thousands of entities, all needing normalizing. We had a bug where cache items could get mixed up, depending on the order of requests. Since request batches will have some variance in when they exactly execute, we could see there was a race condition because different test cases would fail each time we ran ParaTest.

Improvement 4: Splitting into Multiple Files

As we got more tests passing, I started to update our test suite with a bigger set of API requests. GitHub has a limit of 100MB per file, considering it’s uncompressed size. In our case, a 170MB dump would compress to about 14MB, so I was pretty frustrated to discover this limitation. GitHub suggests using LFS, but their implementation is very expensive for storage.

As I was investigating this, I discovered another limitation from PHPUnit’s handling of data providers. I had written the HAR parser to use a Generator and yield individual entries. My assumption was that this would reduce the memory required by PHPUnit, since we only ever needed one test case in memory at a time. It turns out that PHPUnit loads all test cases into an array, so they were getting loaded into memory anyways.

I ended up writing a Robo command to create individual har files from a single file.

 * Split a HAR dump into individual files for API tests.
 * GitHub has a hard limit of 100MB for uncompressed files. They point
 * users to their LFS service, but it is expensive and requires additional
 * setup. Given how compressible these files are (a 170MB file compresses to
 * 14MB with gzip), we hack around this limitation by storing individual
 * files for test cases.
 * Consider removing all prior split files before running this command.
 * @param string $har_path The path to the HAR to split.
public function harSplit(string $har_path) {
  $destination_path = 'web/modules/custom/ukids_hal/tests/fixtures/prod-api';
  $this->_mkdir($destination_path . '/passing');
  $this->_mkdir($destination_path . '/failing');
  $decoded = \GuzzleHttp\json_decode(file_get_contents($har_path), TRUE);
  $not_entries = [];
  foreach ($decoded['log'] as $key => $value) {
    if ($key != 'entries') {
      $not_entries['log'][$key] = $value;

  $this->io()->text("Splitting $har_path into one file per entry");
  foreach ($decoded['log']['entries'] as $index => $entry) {
    $split_har = $not_entries;
    $split_har['log']['entries'][$index] = $entry;
    $encoded = \GuzzleHttp\json_encode($split_har, $json_encoding_options);

    $destination = "$destination_path/$index.har";
    file_put_contents($destination, $encoded);

I then changed the data provider to return just the path to the HAR fixture, and not the whole set of data. That change reduced our PHPUnit memory use by over 100MB.

Improvement 5: Reducing Memory Use from Failure Diffs

As we implemented more APIs and added cases to our test suite, we started to run into another performance problem. While tests themselves would run quickly, PHPUnit would hang for minutes at a time until printing results. Running tests inside of PHPStorm would show corrupt and incorrect results.

Debugging showed that the problem was due to PHPUnit storing all diff results in memory, instead of printing them as they occurred. Some of our API responses were 10MB large, so the diffs could be huge as well.

To improve test performance, we switched the default setup from using assertEquals() to assertTrue():

try {
  // By default, we only assert a boolean. Otherwise, the memory used by
  // storing the strings can cause an OOM.
  $this->assertTrue($expected_response_body === $response, $request_url);
catch (ExpectationFailedException $e) {
  // We've opted in to printing the full assertion texts.
  if (getenv('UKIDS_HAL_PRINT_DIFF')) {
    $this->prettyPrintAssert($expected_response_body, $response, $e);

  throw $e;


private function prettyPrintAssert(string $expected_response_body, string $response, ExpectationFailedException $expectation): void {
  // Pretty print the JSON for easier reading of diffs.
  $pretty = new JsonEncoder(new JsonEncode(JsonEncoder::ENCODING_OPTIONS | JSON_PRETTY_PRINT));
  $pretty_expected = $pretty->encode($pretty->decode($expected_response_body, 'ukids_hal'), 'ukids_hal');
  $pretty_response = $pretty->encode($pretty->decode($response, 'ukids_hal'), 'ukids_hal');
  $this->assertEquals($pretty_expected, $pretty_response, $expectation->getMessage());

  // The above assertion should always fail. If this is thrown, we know
  // that pretty-printing is somehow letting the test pass.
  throw $expectation;

Developers could then set the UKIDS_HAL_PRINT_DIFF environment variable in their local phpunit.xml if they knew they needed diffs for debugging. This improvement reduced post-test result time from many minutes (and sometimes running out of memory) to nearly instant.

Improvement 6: Pass / Fail Directories instead of Counts

Eventually, we got to a point where adding some new functionality would cause a regression, and fixing that regression would cause other regressions. However, since we were only checking test pass counts, it was easy to miss the regressions. It was time to explicitly categorize requests as “passing” or “failing”.

We modified the ./vendor/bin/robo splitHar … command to save all requests into a failing directory. As test cases passed, we moved them to a passing directory. The test data provider was updated to group test cases first into a “passing” and then a “failing” set. That way, any regressions would be quickly caught locally with the --stop-on-failure PHPUnit flag.

Improvement 7: Incomplete instead of Failing

Once we had truly passing tests, local development became a bit trickier as tests “failing because I broke something” became mixed up with “failing because we haven’t done it yet”. Since we had tests grouped by passing and failing, it made sense to mark known-failing tests as incomplete.

try {
  // By default, we only assert a boolean. Otherwise, the memory used by
  // storing the strings can cause an OOM.
  $this->assertTrue($expected_response_body === $response, $request_url);
catch (ExpectationFailedException $e) {
  $this->throwIfTestShouldPass($state, $expected_response_body, $response, $e);


 * If the test case is known to fail, mark as incomplete instead.
 * @param $state
 * @param $expected_response_body
 * @param string $response
 * @param \PHPUnit\Framework\ExpectationFailedException $exception
 * @throws \PHPUnit\Framework\ExpectationFailedException
private function throwIfTestShouldPass($state, $expected_response_body, string $response, ExpectationFailedException $exception): void {
  if ($state == HarFixtureRepository::STATE_FAILING && !getenv('UKIDS_HAL_PRINT_DIFF')) {

  // We've opted in to printing the full assertion texts.
  if (getenv('UKIDS_HAL_PRINT_DIFF')) {
    $this->prettyPrintAssert($expected_response_body, $response, $exception);

  throw $exception;

Now, --stop-on-failure responded exactly as our developers expected.

Improvement 8: Automatically Moving New Passing Tests

As we continued to work on the API, we sometimes discovered we’d fixed test cases without realizing it. Just as we could detect failing tests that should pass, we could also detect failing tests that are now passing and move them into the passing directory.

 * @param $state
 * @param $har_file
 * @param \Drupal\Tests\ukids_hal\ExistingSite\HarFixtureRepository $repository
private function markHarPassed($state, $har_file, HarFixtureRepository $repository): void {
  if ($state == HarFixtureRepository::STATE_FAILING && getenv('UKIDS_HAL_MOVE_PASSED')) {

// In HarFixtureRepository:
public function markHarPassed(string $file_name) {
  rename(self::FIXTURE_PATH . '/' . self::STATE_FAILING . "/$file_name", self::FIXTURE_PATH . '/' . self::STATE_PASSING . "/$file_name");

One edge case with this was discovering that ParaTest seems to run the same test case multiple times. My guess is a bug in it’s code for functional test mode, but we haven’t investigated or reported. This meant that sometimes tests would fail because a test would run, move the fixture, and then run again with the old “failing” fixture path. Since tests would complete in a minute or so once Drupal caches were warm, we just would run tests multiple times (or revert to stock PHPUnit) when we wanted to move passing tests.

Improvement 9: Hashing Request URIs

Remember above when I mentioned that crawling the site with Headless Chrome Crawler ran multiple copies of Chromium in parallel? This exposed another issue with how we were splitting tests after updating to a new migrated Drupal 8 database. Originally, we created numbered test cases like 1.har. When splitting new tests, the Robo command would check if 1.har existed in the passing directory, and if so put the test case there. This started to cause problems when updating test cases, because the order of API requests wasn’t deterministic. This would have been a problem with any browser-based solution, even with one browser instance, but became obvious as soon as we did our second or third update of the test cases. For this API, all of the important data is in the URL (it’s a read-only, GET-only API). To fix this, we stored each test case as a file named with the MD5 hash of its URL:

$hashed = md5($entry['request']['url']);
$destination = "$destination_path/failing/$hashed.har";
if (file_exists("$destination_path/passing/$hashed.har")) {
  $destination = "$destination_path/passing/$hashed.har";

It wasn’t pretty, but it was reliable, and I’ll choose reliable over pretty any day!

Testing Inconsistent or Broken API Responses

An unexpected benefit of this type of test-driven development is that it can help expose bugs in the existing application. We weren’t building to a specification, but to a real-world implementation. For example, we found:

  • Content that was unpublished in Drupal, but still available in the production API.
  • Responses containing only a subset of the image styles on the site.
  • Undocumented values in the responses.

None of these were noticeable by the public or editorial team, but they did kick off important discussions. Should that content be published or not? Is there behavior we missed with image styles we need to reflect in the API? Do we have the source data for the undocumented values we discovered?

Once we decided what to do in each edge case, we had to figure out how to get to passing tests. We decided that we shouldn’t alter the captured responses (because knowing exactly what production was sending was valuable) and instead would alter our assertions in test cases. For example, if a test case failed, we would first check the response type and UUID. If it matched one of the content items where we’d decided that the content should be unpublished (and couldn’t be fixed in production), we’d alter the published flag to false in the expected response, and rerun the test assertion. For image styles, we ended up asserting with assertArraySubset() instead of the whole array.

Not Quite Passing, but Close Enough to Launch

Something interesting happened as we got close to the end of the project. We had about 70% of our tests passing, but the React app was working very well. So well, we ended up launching without getting all tests passing. We could make this choice because we only had one consumer, and all the client cared about was keeping that site running - not having a fully-featured 1-1 API matching their Drupal 7 site. If the Drupal 8 site was omitting fields that the React app never used, then it was a feature that we didn’t include that complexity.

Once we launched, we had to make a decision about the existing site tests. The migration was done, and the old Drupal 7 infrastructure was going to go away. We decided that what was important was to start tracking regressions against the new Drupal 8 responses since we knew they worked for the React site. We did one final crawl, but this time pointing mitmdump to the Drupal 8 API. All of the tests would then pass, leaving us with an empty failing directory.

Not Done: Hard-coding Times

One edge case we never had to deal with was with scheduled publishing of content. I totally thought we would - but we kept waiting for tests to break, and they never did. This would be important for any site with content scheduling or time-based ACLs on content. If your site relies on this, be sure to swap the datetime.time service with one that returns a fixed timestamp.

A New Project: A HAR (de)serializer for PHP

I haven’t included full examples, as our tests ended up having a ton of site-specific code as well. However, one surprising finding was that there was no generalized HAR library for PHP. We wrote two test classes, a HarFixture and HarFixtureRepository which went through several iterations. They also returned raw arrays, which is an antipattern for many reasons (but the fastest path when exploring to see if this approach was even possible). I’ve created a new library at deviantintegral/har using the JMS Serializer library. It includes serializing and deserializing support, adapters so you can use HAR requests and responses as PSR-7 objects, and a console command to split a HAR dump similar to the above. I hope it helps bootstrap your next decoupled project!

Get in touch with us

Tell us about your project or drop us a line. We'd love to hear from you!