Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove .html and .html/amp extension with .htaccess only for files from directory

After a site move I want to be able to remove the extension (if any) and query string (if any) to leave just the file name and keep the path

https://www.example.com/blog/anyfile.html
301 to >> https://www.example.com/blog/anyfile

https://example.com/blog/anyfile.html/amp
301 to >> https://www.example.com/blog/anyfile

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

https://www.example.com/blog/anyfile.html/amp?nonamp=1
301 to >> https://www.example.com/blog/anyfile

I tried something like this, but it doesn’t keep the /blog/ folder:

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/blog/
RewriteRule ^.*/([^/]+)\.html$ /$1? [L,NC,R]

also, I can’t find a way to remove /amp after .html

>Solution :

Near the top of the root .htaccess file you could do something like the following to discard .html and .html/amp and .html/<anything> from the end of the URL-path. And discard the query string (if any) at the same time:

# Strip ".html" onwards from the end of the URL (and remove query string)
RewriteRule ^(.*)\.html(/.*)?$ https://www.example.com/$1 [QSD,R=301,L]

You need to hardcode the scheme + hostname if you wish to satisfy your second example and redirect from example.com to www.example.com. This could be generalised (without hardcoding the domain) if we know that your site is only accessible by the www subdomain or domain apex and this single domain.

However, the above won’t catch URLs that only include a query string, but don’t contain .html in the URL-path. For that you could implement an additional rule, following the rule above:

# Strip the query string from any URL.
RewriteCond %{QUERY_STRING} .
RewriteRule ^ https://www.example.com%{REQUEST_URI} [QSD,R=301,L]

A look at your existing rule:

RewriteCond %{REQUEST_URI} ^/blog/
RewriteRule ^.*/([^/]+)\.html$ /$1? [L,NC,R]

You are only capturing the filename (anyfile in your example) and discarding the URL-path that precedes this (ie. blog/). So the $1 backreference only contains anyfile. This also only matches URLs that end in .html and not .html/amp.

Checking the URL-path in the RewriteCond directive is superfluous.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading