Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex that matches files with two or more unique extensions

I am trying to create regex that matches files with two or more extensions, but I want to ignore duplicate extensions.

Examples of results I want:

videogame.exe.exe     - Don't match
unknown.pdf.exe       - Match
Resume.doc.docx       - Match
SalesNumbers.pdf.pdf  - Don't Match
summary.pdf.docx      - Match

Currently I have the following regex:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(\.\w+)(?!\1)\.\w+

This almost works. However, if the second extension contains the first I don’t get a match. The following examples do not provide matches, but I want them to.

Resume.doc.docx
text.exe.exe1

>Solution :

This regex, using word boundary:

(\.\w+)(?!\1\b)\.\w+
            ^^

or

(\.\w+)(?!\1$)\.\w+
            ^

Online Demo

The regular expression matches as follows:

Node Explanation
( group and capture to \1:
\. .
\w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
) end of \1
(?! [look ahead](https://perldoc.perl.org/perlre#Lookaround-A ssertions) to see if there is not:
\1 what was matched by capture \1
\b the boundary anchor between a word char (\w) and something that is not a word char anchor
) end of look-ahead
\. .
\w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading