Taking Advantage of Drupal’s Text Format Filters in a Decoupled Site

by Matt Robison

Drupal has a great system of text formats and filters to help limit markup and do other creative things, like rendering embed tokens. These are applied when a formatted text field, like a node’s body field, is being prepared to render on the front end.

In the context of a decoupled site, however, we never reach the rendering process. We get the data directly from Drupal via an API call, and bypass its built-in rendering process. The application of these filters never happens.

This is all for the good. We don’t want to make assumptions about how our data is consumed. But this leads to some potential problems. What if we are sending on some malicious code? What if we have run a migration and our fields have a bunch of cruft that we don’t want to allow? All of these things will be sent along to our consumer.

While our consumer should certainly do their due diligence to ensure they aren’t consuming anything malicious or ridiculous, we should do what we can to make the internet a safer place. But there is good news: we can take advantage of Drupal’s filters in this context and extract the benefits we need while still bypassing Drupal’s render pipeline.

In our examples below, we’ll be using the JSON API module. By the end of this article, you will learn:

  • How to write a custom normalizer for JSON API.
  • How to use individual Drupal filters in almost any context.

The repo of the example code used in this article can be found on Github.

Custom Normalizer

JSONAPI utilizes the Serialization API, and passes your data recursively through a series of “normalizers.” It will start at the top level of the request, then go to the entity level, and then pass each field value of that entity through normalization. For each piece of data, the serializer will look for the normalizer with the highest priority that is set up to normalize that piece of data. 

To get our bearings, let’s look at some relevant parts of the FieldItemNormalizer. Additional explanation is in the comments below.

class FieldItemNormalizer extends NormalizerBase {

  /**
   * This variable is checked in the class's ::supportsNormalization() method.
   * It is an easy way to limit the application of the normalizer without needing
   * to override ::supportsNormalization().
   */
  protected $supportedInterfaceOrClass = FieldItemInterface::class;

  /** 
   * The meat of the class loops through every property of a FieldItem and passes
   * it on to be normalized. Typically at this point, each property will be a simple
   * Scalar value, but you never know what might need some additional processing.
   */
  public function normalize($field_item, $format = NULL, array $context = []) {
    /** @var \Drupal\Core\TypedData\TypedDataInterface $property */
    $values = [];
    // We normalize each individual property, so each can do their own casting,
    // if needed.
    foreach ($field_item as $property_name => $property) {
      $values[$property_name] = $this->serializer->normalize($property, $format, $context);
    }

    if (isset($context['langcode'])) {
      $values['lang'] = $context['langcode'];
    }
    return new FieldItemNormalizerValue($values);
  }
}

This is the default normalize() method that will eventually get called on every FieldItem on an entity. We’ll now jump into this Normalization process with our own requirements, using this as a base.

First, we want our custom normalizer to only affect formatted text fields. This is easy to do after extending from FieldItemNormalizer, thanks to the $supportedInterfaceOrClass property.

class FormattedTextNormalizer extends FieldItemNormalizer {
  protected $supportedInterfaceOrClass = TextItemBase::class;
}

There is one more thing we have to do. Remember that the serializer looks at all available normalizers and chooses the one with highest priority that supports the data in question. We need some way to tell Drupal about our new normalizer and its priority, so it gets checked before the default FieldItemNormalizer.

We do this by putting something in our module’s services file.

services:
  serializer.normalizer.formatted_text.example_normalizer:
    class: Drupal\example_normalizer\Normalizer\FormattedTextNormalizer
    tags:
      # Give it a priority higher than the default jsonapi service
      # serializer.normalizer.field_item.jsonapi
      - { name: normalizer, priority: 22 }

Compare this to the services declaration in the JSONAPI module for FieldItemNormalizer.

serializer.normalizer.field_item.jsonapi:
  class: Drupal\jsonapi\Normalizer\FieldItemNormalizer
  tags:
    - { name: normalizer, priority: 21 }

After clearing the cache, Drupal will know about our normalizer, and it will be run for all formatted text fields instead of the default FieldItemNormalizer.

Applying Drupal’s Filters Ad-hoc

We have our custom normalizer working…but right now it doesn’t do anything different from its parent. Let’s change that.

Our work will be concentrated on the foreach() loop of the normalize function. It loops through every property on a field item. Formatted text fields have two properties: value and format. We’ll make use of both.

public function normalize($field_item, $format = NULL, array $context = array()) {
  $values = [];
  foreach ($field_item as $property_name => $property) {
    if ($property_name == 'value') {
      $value = $property->getValue();
      $format = FilterFormat::load($field_item->format);
    }
    
    $values[$property_name] = $this->serializer->normalize($property, $format, $context);
  }
  if (isset($context['langcode'])) {
  $values['lang'] = $context['langcode'];
  }
  return new FieldItemNormalizerValue($values);
}

This grabs us the original value for the field and the format config entity assigned to it. At this point, we might be tempted to process our text through the whole format with check_markup(). This would work fine, as long as we were explicit in our $filter_types_to_skip array.

But we only want one filter type to run, regardless (filter_html, to get rid of disallowed and malicious markup). And check_markup also uses the renderer service, which might be a bit overkill in this instance.

So here is how we grab the filter we want to use and process our text with it.

$filter = $format->filters()->get('filter_html');
$property = $filter->process($value, 'en')->getProcessedText();

And that’s it. This allows us to take advantage of the settings used in the text format, without having to write our own rules from scratch. The UI for allowed HTML tags in the filter_html filter is easy to use and understand, and, when dealing with potential security issues, that’s always a positive. Let’s just re-use those rules.

If this filtered HTML was all we want, why didn’t we go one step further and just use Drupal’s Xss class? This is a real possibility, depending on our needs, but it would skip all of the hard work done to make filter_html easy to use. For example, we could only filter out tags, and not just specific attributes of certain tags.

I encourage you to take a glance at everything FilterHtml goes through to give you safe markup, with simple configuration. In my opinion, it would be silly not to use it.

Here is our final normalizer, with all of our changes:  

class FormattedTextNormalizer extends FieldItemNormalizer {

  protected $supportedInterfaceOrClass = TextItemBase::class;

  public function normalize($field_item, $format = NULL, array $context = []) {
    /** @var \Drupal\Core\TypedData\TypedDataInterface $property */
    $values = [];
    // We normalize each individual property, so each can do their own casting,
    // if needed.
    foreach ($field_item as $property_name => $property) {
      if ($property_name == 'value') {
        $value = $property->getValue();
        $format = FilterFormat::load($field_item->format);
        
        $filter = $format->filters()->get('filter_html');
        $property = $filter->process($value, 'en')->getProcessedText();
      }
      
      $values[$property_name] = $this->serializer->normalize($property, $format, $context);
    }

    if (isset($context['langcode'])) {
      $values['lang'] = $context['langcode'];
    }
    return new FieldItemNormalizerValue($values);
  }
}

Now all markup sent along in formatted text fields will be run through Drupal’s filter. It is easy to add more filters if needed, depending on your context and requirements. This time, we didn’t want any of our embedded content expanded into markup, to let the consumer decide what to do, but it is easy to imagine a scenario where you might want to fully expand certain short-codes.  This example should give you the tools to accomplish whatever you need.

View the full repo of this example code.

newsletter-bot