Consider the following lines:
http://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3/
https://www.regex101.com/r/aC9tW2/3
https://www.regex101.com/r/aC9tW2/3/
In practice, all these URLs are the same. They’re almost the same, but not quite, in theory.
How can I make Notepad++ remove all occurrences except the first if lines are very, very similar? My hope is to keep line 2 above and delete all other ones, but I’m okay with keeping line 1 only then changing HTTP to HTTPS later.
>Solution :
You may try the following find and replace, in regex mode:
Find: (https?):\/\/(\S+)(?:\s+https?:\/\/\2)*
Replace: $1://$2
Demo
The strategy used here is to match:
(https?) match http or https and capture in $1
: :
// //
(\S+) match and capture remainder of URL in $2
(?:\s+https?:\/\/\2)* then match the same URL zero or more subsequent times
We then replace with $1://$2 to replace all duplicates with the first occurrence of the URL.