Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Notepad++: How to keep the first occurrence of almost identical lines?

Consider the following lines:

http://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3
https://regex101.com/r/aC9tW2/3/
https://www.regex101.com/r/aC9tW2/3
https://www.regex101.com/r/aC9tW2/3/

In practice, all these URLs are the same. They’re almost the same, but not quite, in theory.

How can I make Notepad++ remove all occurrences except the first if lines are very, very similar? My hope is to keep line 2 above and delete all other ones, but I’m okay with keeping line 1 only then changing HTTP to HTTPS later.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You may try the following find and replace, in regex mode:

Find:    (https?):\/\/(\S+)(?:\s+https?:\/\/\2)*
Replace: $1://$2

Demo

The strategy used here is to match:

(https?)               match http or https and capture in $1
:                      :
//                     //
(\S+)                  match and capture remainder of URL in $2
(?:\s+https?:\/\/\2)*  then match the same URL zero or more subsequent times

We then replace with $1://$2 to replace all duplicates with the first occurrence of the URL.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading