- ⚠️ Elements with
display:nonemay hide malicious content or spam links harmful to your site and SEO. - 🧼 HTMLPurifier is excellent for removing dangerous HTML but doesn’t handle inline CSS like
display:none. - 🧠 DOMDocument is more precise for removing elements based on style rules and structural hierarchy.
- 🛠️ Combining HTMLPurifier with DOMDocument achieves both secure sanitation and visual cleanliness.
- ⚙️ Regex removal is fast but unreliable and shouldn’t be used in isolation for production cases.
Why Remove Elements With display:none?
Removing elements with display:none from user-submitted HTML helps you show cleaner, safer, and more useful content on your site. If you build a CMS, a comment system, or any app that processes HTML from users, you must know how to clean up invisible content. This guide will show you how to do this using HTMLPurifier and PHP’s DOMDocument. We will discuss the pros and cons, and how to set it up to keep your application secure and working well.
Understanding display:none in User HTML
When a user sends HTML content, you often let them add bold, italic, images, and sometimes inline styles. But some of those styles might hide something bad: display:none.
In standard HTML/CSS:
- Elements with
display:nonedo not show up on the screen. - They don't take up space in the layout.
- They are fully invisible, even to screen readers.
But, they are still part of the DOM and still processed by crawlers, bots, and scripts. This makes them easy to abuse.
Some risks include:
- Hidden spam links: Adding promotional or bad links without the user or admin seeing them.
- SEO manipulation: Putting in many keywords that users can't see but crawlers can read.
- Tracking pixels & malware: Embedding small (1×1 px) iframes or scripts for bad tracking.
So, if your platform takes HTML from users, it is important to find and deal with elements hidden by display:none.
HTMLPurifier: Cleaning HTML for Safety
HTMLPurifier is a PHP tool that cleans HTML. Many web developers trust it because of its strong security and how it follows standards. It mainly does these things:
- It removes XSS attack paths.
- It makes poorly written HTML follow W3C standards.
- It only allows certain tags, attributes, and links.
- It can automatically fix problems in HTML structure.
For example, HTMLPurifier will remove this right away:
<script>alert('XSS');</script>
Or it will clean this:
<a href="javascript:alert('XSS')">click me</a>
to simply take out the dangerous part. It is very important for stopping Cross-Site Scripting (XSS) attacks, especially in apps where users can add rich HTML content.
#### Built-in Features of HTMLPurifier:
- Custom allowed HTML tags and attributes.
- Restrictions on URI schemes.
- Automatic fixes for unclosed tags.
- Support for embedded configuration.
- Caching for better performance when busy.
But, HTMLPurifier, while able to do much, falls short in one key area.
---
### 🚫 HTMLPurifier Doesn’t Parse Styles: A Key Limitation
Even though it offers many protections, HTMLPurifier does not understand or act on CSS style information. This means it does not check if something has `display:none`.
Why?
- Reading CSS is very different from reading HTML.
- It would need complex logic and calculations, which are far beyond what HTMLPurifier is made to do.
So, HTMLPurifier will happily let this pass:
```html
<div style="display:none">You can’t see me, but I'm here!</div>
Even if that content is invisible, spammy, and could be harmful.
It’s important to remember: HTMLPurifier treats inline styles as plain text. It makes sure they are valid HTML attributes, but it will not check what they mean for abuse.
Option 1: Use Regex for Pre-Processing
You can use Regex (regular expressions) to find and remove HTML elements with specific styles, like display:none, before cleaning them.
$html = preg_replace(
'/<[^>]+style\s*=\s*"[^"]*display\s*:\s*none[^"]*"[^>]*>.*?<\/[^>]+>/is',
'',
$html
);
Pros:
- It is fast and light.
- It is easy to write and add to existing processes.
Cons:
- Regex and HTML parsing do not mix well; it is very fragile.
- It can miss unusual cases like:
- Single quotes instead of double.
- Spaces in odd places.
- HTML tags that are nested or not formed correctly.
- It does not handle
<style>blocks or CSS selectors.
Good For:
- Simple cases with limited user access.
- When speed is more important than perfect parsing.
But, do not rely only on regex for HTML input that is deeply nested or complex. The chances of bugs go up greatly.
Option 2: Customizing HTMLPurifier with HTMLDefinition
Developers might ask if HTMLPurifier can be changed to handle inline style values like display:none. To some extent, yes, you can set up custom behavior for tags and attributes using the HTMLDefinition object.
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'custom-def');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null);
if ($def = $config->maybeGetRawHTMLDefinition()) {
$def->addAttribute('div', 'style', 'CSS');
}
But:
- Even with a custom attribute definition, the
stylevalues are not fully parsed. - You still cannot remove a node based on its CSS declaration.
- It cannot handle decisions like 'only remove this tag if it has
display:none'.
This method works best for defining which attributes are allowed, not for filtering based on what those attributes mean.
Option 3: Post-Processing with DOMDocument (Best Practice)
For exactness and full control, use PHP’s DOMDocument to read and clean HTML after HTMLPurifier has sanitized it.
Here is a good way to do it:
require_once 'htmlpurifier/library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$sanitizedHtml = $purifier->purify($html);
$dom = new DOMDocument();
@$dom->loadHTML(mb_convert_encoding($sanitizedHtml, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[@style]') as $node) {
$style = strtolower($node->getAttribute('style'));
if (strpos($style, 'display:none') !== false) {
$node->parentNode->removeChild($node);
}
}
$cleanHtml = $dom->saveHTML();
Why this is better:
- It works based on the DOM structure, not loose text parsing.
- It finds all nodes with a
styleattribute. - It matches
display:noneexactly, removing only those elements. - And then, you can add more rules, for example:
- Trusting certain classes.
- Allowing a hidden state with specific attributes.
Advanced Example: Whitelisting Certain Hidden Elements
Suppose you use tabs, modals, or collapsible items that need display:none.
Instead of removing all hidden content everywhere, change your post-processing script:
foreach ($xpath->query('//*[@style]') as $node) {
$style = strtolower($node->getAttribute('style'));
$allowedHidden = in_array($node->getAttribute('class'), ['modal', 'tab-content']);
$hasFlag = $node->hasAttribute('data-allow-hidden');
if (strpos($style, 'display:none') !== false && !$allowedHidden && !$hasFlag) {
$node->parentNode->removeChild($node);
}
}
This keeps legitimate hidden elements while getting rid of misuse.
Performance Considerations
Using both HTMLPurifier and DOMDocument generally works well, especially if you:
- Do not process the same input over and over.
- Use caching, such as saving cleaned content in a database.
- Clean HTML when users submit it, instead of every time a page loads.
For busy applications:
- Consider a system that cleans and saves HTML offline in groups.
- Use file-based caching with HTMLPurifier for definition schemas.
- Check processing speeds based on how complex the content is.
Security Considerations
Relying on client-side code like JavaScript or CSS to hide content is not safe. Users, or bad bots, can look at the raw HTML and use it for harm. Hidden or unused code should be removed on the server.
Benefits of cleaning everything include:
- 🔐 Protection from XSS attacks.
- 🚫 Removal of spammy or useless content.
- 🌍 Good SEO: avoids problems from hidden keyword stuffing.
- 🤖 Makes fewer places for bots to attack.
The CWE-79 vulnerability describes the real risks of leaving weak spots from user inputs open in your application.
Summary: Combine Tools for Best Sanitation Results
Using HTMLPurifier and DOMDocument together gives you the best of both.
| Task | Tool |
|---|---|
| Remove malicious code/scripts | ✅ HTMLPurifier |
| Normalize HTML | ✅ HTMLPurifier |
| Strip invisible or styled elements | ✅ DOMDocument |
| Fine-grained attribute checks | ✅ DOMDocument |
| CSS parsing | ❌ Neither does full CSS parsing |
Follow this good order:
- Clean HTML input with HTMLPurifier (for XSS and strong structure).
- And then, process it with DOMDocument to remove invisible or styled content.
When You Should Not Remove display:none
Not all hidden content is bad. Consider:
- Elements used in how the user interface works, like modals, toggles, or tabs.
- SPA actions that turn parts on and off using JavaScript.
- Accessibility improvements that rely on things being shown or hidden.
To safely keep legitimate display:none elements:
- Add
data-allow-hidden="true"attributes. - Allow known classes, like
accordion-itemortab-pane. - Ignore server-side removal if the parent context ensures safety.
Final Thoughts
Removing display:none elements from user HTML is an important but tricky practice. Invisible elements can be misused for bad reasons, SEO spam, or issues with how pages show up. HTMLPurifier gives you basic protection against dangerous tags and attributes. DOMDocument improves that protection by adding style filters for things like hidden elements.
Together, they offer a complete way to clean HTML in PHP safely and strongly. This lets developers make sure their content is clear in both its structure and what people see.
Sources:
- W3C. (2018). CSS display documentation. Retrieved from https://www.w3.org/TR/CSS2/visuren.html#display-props
- MDN Web Docs. (2023). display – CSS: Cascading Style Sheets. Retrieved from https://developer.mozilla.org/en-US/docs/Web/CSS/display
- MITRE. (n.d.). CWE-79: Improper Neutralization of Input During Web Page Generation (‘Cross-site Scripting’). Retrieved from https://cwe.mitre.org/data/definitions/79.html