Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove display:none in HTMLPurifier?

Learn how to remove HTML elements with display:none using HTMLPurifier or PHP DOM. Improve output sanitization easily.
Before and after sanitized HTML comparison showing the removal of display:none elements using PHP and HTMLPurifier Before and after sanitized HTML comparison showing the removal of display:none elements using PHP and HTMLPurifier
  • ⚠️ Elements with display:none may hide malicious content or spam links harmful to your site and SEO.
  • 🧼 HTMLPurifier is excellent for removing dangerous HTML but doesn’t handle inline CSS like display:none.
  • 🧠 DOMDocument is more precise for removing elements based on style rules and structural hierarchy.
  • 🛠️ Combining HTMLPurifier with DOMDocument achieves both secure sanitation and visual cleanliness.
  • ⚙️ Regex removal is fast but unreliable and shouldn’t be used in isolation for production cases.

Why Remove Elements With display:none?

Removing elements with display:none from user-submitted HTML helps you show cleaner, safer, and more useful content on your site. If you build a CMS, a comment system, or any app that processes HTML from users, you must know how to clean up invisible content. This guide will show you how to do this using HTMLPurifier and PHP’s DOMDocument. We will discuss the pros and cons, and how to set it up to keep your application secure and working well.


Understanding display:none in User HTML

When a user sends HTML content, you often let them add bold, italic, images, and sometimes inline styles. But some of those styles might hide something bad: display:none.

In standard HTML/CSS:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • Elements with display:none do not show up on the screen.
  • They don't take up space in the layout.
  • They are fully invisible, even to screen readers.

But, they are still part of the DOM and still processed by crawlers, bots, and scripts. This makes them easy to abuse.

Some risks include:

  • Hidden spam links: Adding promotional or bad links without the user or admin seeing them.
  • SEO manipulation: Putting in many keywords that users can't see but crawlers can read.
  • Tracking pixels & malware: Embedding small (1×1 px) iframes or scripts for bad tracking.

So, if your platform takes HTML from users, it is important to find and deal with elements hidden by display:none.


HTMLPurifier: Cleaning HTML for Safety

HTMLPurifier is a PHP tool that cleans HTML. Many web developers trust it because of its strong security and how it follows standards. It mainly does these things:

  • It removes XSS attack paths.
  • It makes poorly written HTML follow W3C standards.
  • It only allows certain tags, attributes, and links.
  • It can automatically fix problems in HTML structure.

For example, HTMLPurifier will remove this right away:

<script>alert('XSS');</script>

Or it will clean this:

<a href="javascript:alert('XSS')">click me</a>
to simply take out the dangerous part. It is very important for stopping Cross-Site Scripting (XSS) attacks, especially in apps where users can add rich HTML content.

#### Built-in Features of HTMLPurifier:

- Custom allowed HTML tags and attributes.
- Restrictions on URI schemes.
- Automatic fixes for unclosed tags.
- Support for embedded configuration.
- Caching for better performance when busy.

But, HTMLPurifier, while able to do much, falls short in one key area.

---

### 🚫 HTMLPurifier Doesn’t Parse Styles: A Key Limitation

Even though it offers many protections, HTMLPurifier does not understand or act on CSS style information. This means it does not check if something has `display:none`.

Why?

- Reading CSS is very different from reading HTML.
- It would need complex logic and calculations, which are far beyond what HTMLPurifier is made to do.

So, HTMLPurifier will happily let this pass:

```html
<div style="display:none">You can’t see me, but I'm here!</div>

Even if that content is invisible, spammy, and could be harmful.

It’s important to remember: HTMLPurifier treats inline styles as plain text. It makes sure they are valid HTML attributes, but it will not check what they mean for abuse.


Option 1: Use Regex for Pre-Processing

You can use Regex (regular expressions) to find and remove HTML elements with specific styles, like display:none, before cleaning them.

$html = preg_replace(
    '/<[^>]+style\s*=\s*"[^"]*display\s*:\s*none[^"]*"[^>]*>.*?<\/[^>]+>/is',
    '',
    $html
);

Pros:

  • It is fast and light.
  • It is easy to write and add to existing processes.

Cons:

  • Regex and HTML parsing do not mix well; it is very fragile.
  • It can miss unusual cases like:
    • Single quotes instead of double.
    • Spaces in odd places.
    • HTML tags that are nested or not formed correctly.
  • It does not handle <style> blocks or CSS selectors.

Good For:

  • Simple cases with limited user access.
  • When speed is more important than perfect parsing.

But, do not rely only on regex for HTML input that is deeply nested or complex. The chances of bugs go up greatly.


Option 2: Customizing HTMLPurifier with HTMLDefinition

Developers might ask if HTMLPurifier can be changed to handle inline style values like display:none. To some extent, yes, you can set up custom behavior for tags and attributes using the HTMLDefinition object.

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'custom-def');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null);

if ($def = $config->maybeGetRawHTMLDefinition()) {
    $def->addAttribute('div', 'style', 'CSS');
}

But:

  • Even with a custom attribute definition, the style values are not fully parsed.
  • You still cannot remove a node based on its CSS declaration.
  • It cannot handle decisions like 'only remove this tag if it has display:none'.

This method works best for defining which attributes are allowed, not for filtering based on what those attributes mean.


Option 3: Post-Processing with DOMDocument (Best Practice)

For exactness and full control, use PHP’s DOMDocument to read and clean HTML after HTMLPurifier has sanitized it.

Here is a good way to do it:

require_once 'htmlpurifier/library/HTMLPurifier.auto.php';

$purifier = new HTMLPurifier();
$sanitizedHtml = $purifier->purify($html);

$dom = new DOMDocument();
@$dom->loadHTML(mb_convert_encoding($sanitizedHtml, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[@style]') as $node) {
    $style = strtolower($node->getAttribute('style'));
    if (strpos($style, 'display:none') !== false) {
        $node->parentNode->removeChild($node);
    }
}

$cleanHtml = $dom->saveHTML();

Why this is better:

  • It works based on the DOM structure, not loose text parsing.
  • It finds all nodes with a style attribute.
  • It matches display:none exactly, removing only those elements.
  • And then, you can add more rules, for example:
    • Trusting certain classes.
    • Allowing a hidden state with specific attributes.

Advanced Example: Whitelisting Certain Hidden Elements

Suppose you use tabs, modals, or collapsible items that need display:none.

Instead of removing all hidden content everywhere, change your post-processing script:

foreach ($xpath->query('//*[@style]') as $node) {
    $style = strtolower($node->getAttribute('style'));
    $allowedHidden = in_array($node->getAttribute('class'), ['modal', 'tab-content']);
    $hasFlag = $node->hasAttribute('data-allow-hidden');

    if (strpos($style, 'display:none') !== false && !$allowedHidden && !$hasFlag) {
        $node->parentNode->removeChild($node);
    }
}

This keeps legitimate hidden elements while getting rid of misuse.


Performance Considerations

Using both HTMLPurifier and DOMDocument generally works well, especially if you:

  • Do not process the same input over and over.
  • Use caching, such as saving cleaned content in a database.
  • Clean HTML when users submit it, instead of every time a page loads.

For busy applications:

  • Consider a system that cleans and saves HTML offline in groups.
  • Use file-based caching with HTMLPurifier for definition schemas.
  • Check processing speeds based on how complex the content is.

Security Considerations

Relying on client-side code like JavaScript or CSS to hide content is not safe. Users, or bad bots, can look at the raw HTML and use it for harm. Hidden or unused code should be removed on the server.

Benefits of cleaning everything include:

  • 🔐 Protection from XSS attacks.
  • 🚫 Removal of spammy or useless content.
  • 🌍 Good SEO: avoids problems from hidden keyword stuffing.
  • 🤖 Makes fewer places for bots to attack.

The CWE-79 vulnerability describes the real risks of leaving weak spots from user inputs open in your application.


Summary: Combine Tools for Best Sanitation Results

Using HTMLPurifier and DOMDocument together gives you the best of both.

Task Tool
Remove malicious code/scripts ✅ HTMLPurifier
Normalize HTML ✅ HTMLPurifier
Strip invisible or styled elements ✅ DOMDocument
Fine-grained attribute checks ✅ DOMDocument
CSS parsing ❌ Neither does full CSS parsing

Follow this good order:

  1. Clean HTML input with HTMLPurifier (for XSS and strong structure).
  2. And then, process it with DOMDocument to remove invisible or styled content.

When You Should Not Remove display:none

Not all hidden content is bad. Consider:

  • Elements used in how the user interface works, like modals, toggles, or tabs.
  • SPA actions that turn parts on and off using JavaScript.
  • Accessibility improvements that rely on things being shown or hidden.

To safely keep legitimate display:none elements:

  • Add data-allow-hidden="true" attributes.
  • Allow known classes, like accordion-item or tab-pane.
  • Ignore server-side removal if the parent context ensures safety.

Final Thoughts

Removing display:none elements from user HTML is an important but tricky practice. Invisible elements can be misused for bad reasons, SEO spam, or issues with how pages show up. HTMLPurifier gives you basic protection against dangerous tags and attributes. DOMDocument improves that protection by adding style filters for things like hidden elements.

Together, they offer a complete way to clean HTML in PHP safely and strongly. This lets developers make sure their content is clear in both its structure and what people see.


Sources:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading