Developing a new filter for CKEditor to remove the broken URL.

We were maintaining a news portal for the past three years. A few months back we found that our portal has lots of broken URLs and those were challenging the SEO of the website. We have also found that most of them were external URLs. and it was a time-consuming process for editors to edit and remove those broken links from 3 lakh articles.

Fortunately, Drupal provides a strong filter API for processing text, those filters work for both, post text processing and in-place editing. Drupal 8 filter API is almost the same as Drupal 7 with few changes.

  • Now we use plugins to define filters instead of hooks
  • We need to specify the type of filter to make it work correctly.

Reference
— 
https://www.drupal.org/docs/8/api/filter-api/overview
— https://www.drupal.org/docs/core-modules-and-themes/core-modules/filter-module/filter-module-overview

Now, let's check the annotation to be used to define a filter.

* Provides a filter to fix broken urls from content.
*
* @Filter(
*   id = "filter_broken_url_fixer",
*   title = @Translation("Remove broken url from content."),
*   description = @Translation("Description of filter."),
*   type = Drupal\filter\Plugin\FilterInterface::TYPE_MARKUP_LANGUAGE,
*   settings = {
*     "settting_key" = "setting_value",
*   },
*   weight = 100,
* )

You can get more references from the filter module of the drupal core. (core/modules/filter).
We are not using any settings in our filter so we are excluding that from our example code.

<?php

namespace Drupal\ckeditor_broken_url_fixer\Plugin\Filter;

use Drupal\Component\Utility\Html;
use Drupal\filter\FilterProcessResult;
use Drupal\filter\Plugin\FilterBase;

/**
 * Provides a filter to fix broken urls from content.
 *
 * @Filter(
 *   id = "filter_broken_url_fixer",
 *   title = @Translation("Remove broken url from content."),
 *   type = Drupal\filter\Plugin\FilterInterface::TYPE_MARKUP_LANGUAGE
 * )
 */
class FilterBrokenUrlFixer extends FilterBase {

  /**
   * {@inheritdoc}
   */
  public function process($text, $langcode) {

    $result = new FilterProcessResult($text);

    $dom = Html::load($text);
    $xpath = new \DOMXPath($dom);

    // Get all a tag having href attribute.
    foreach ($xpath->query('//a[@href]') as $node) {
      $url = $node->getAttribute('href');
      if (!$this->validateUrl($url)) {
        // Create text node.
        $link_text = new \DomText($node->nodeValue);
        // insert it before the link node.
        $node->parentNode->insertBefore($link_text, $node);
        // And remove the link node.
        $node->parentNode->removeChild($node);
      }
    }

    $result->setProcessedText(Html::serialize($dom));

    return $result;
  }

  /**
   * {@inheritdoc}
   */
  public function tips($long = FALSE) {
    return $this->t('Remove broken url from content.');
  }

  /**
   * Validate url headers.
   *
   * @param null $url
   *
   * @return bool
   */
  private function validateUrl($url = NULL) {
    if ($url) {
      try {
        $headers = get_headers($url);
        $headers = (is_array($headers)) ? implode("\n ", $headers) : $headers;
        return (bool) preg_match('#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers);
      } catch (\Exception $e) {
        // Return false.
      }
    }
    return FALSE;
  }

}

Once we have done with the filter code we need to enable it from editor configuration. (/admin/config/content/formats/manage/full_html).

Broken url ckeditor filter.

The above filter code works well with external URLs, However, it might impact website performance if we are not using any cache system.