I am trying to create regex that matches files with two or more extensions, but I want to ignore duplicate extensions.
Examples of results I want:
videogame.exe.exe - Don't match
unknown.pdf.exe - Match
Resume.doc.docx - Match
SalesNumbers.pdf.pdf - Don't Match
summary.pdf.docx - Match
Currently I have the following regex:
(\.\w+)(?!\1)\.\w+
This almost works. However, if the second extension contains the first I don’t get a match. The following examples do not provide matches, but I want them to.
Resume.doc.docx
text.exe.exe1
>Solution :
This regex, using word boundary:
(\.\w+)(?!\1\b)\.\w+
^^
or
(\.\w+)(?!\1$)\.\w+
^
Online Demo
The regular expression matches as follows:
Node | Explanation |
---|---|
( |
group and capture to \1: |
\. |
. |
\w+ |
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) |
) |
end of \1 |
(?! |
[look ahead](https://perldoc.perl.org/perlre#Lookaround-A ssertions) to see if there is not: |
\1 |
what was matched by capture \1 |
\b |
the boundary anchor between a word char (\w) and something that is not a word char anchor |
) |
end of look-ahead |
\. |
. |
\w+ |
word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) |