Remove preceding duplicate numbers in a file – bash

In the text file below "BEFORE FILE", how would I remove the duplicate numbers to make it look like the "AFTER FILE" below? The "_PRODxxxx," where the x’s are the numbers, will stay in that format.

BEFORE FILE

NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull

AFTER FILE

NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull

I have tried using sed and a regex capture group like "PROD[1-9]{2,4}" but cannot get it to work.

>Solution :

Use a capture group to capture the first digit, and a back-reference to match repetitions of it. Then use the same back-reference in the replacement to produce just one of it.

sed -E 's/PROD([1-9])\1+,/PROD\1,/'

Leave a Reply