Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove preceding duplicate numbers in a file – bash

In the text file below "BEFORE FILE", how would I remove the duplicate numbers to make it look like the "AFTER FILE" below? The "_PRODxxxx," where the x’s are the numbers, will stay in that format.

BEFORE FILE

NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull

AFTER FILE

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull

I have tried using sed and a regex capture group like "PROD[1-9]{2,4}" but cannot get it to work.

>Solution :

Use a capture group to capture the first digit, and a back-reference to match repetitions of it. Then use the same back-reference in the replacement to produce just one of it.

sed -E 's/PROD([1-9])\1+,/PROD\1,/'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading