Making duplicate record unique using awk

I am trying to use awk to identify duplicate records in a file and apply the changes directly to it. The file has six columns with no headers. My aim is to edit the second column of the duplicate record to make it unique by adding 1 every time it appears. The data looks like this:

1 A B C D E
1 A B C D E   (This is a duplicate record1)
1 A B C D E   (This is a duplicate record2)
2 F G H I J
3 K L M N O

The desired output

1 A   B C D E
1 A-1 B C D E
1 A-2 B C D E
2 F   G H I J
3 K   L M N O


I tried this code awk 'cnt[$0]++{$0=$0" variant "cnt[$0]-1} 1' file from this post How to rename duplicate lines with awk? but the numbers are added at the end of the record

>Solution :

With your shown samples please try following awk code.

One-liner form of above solution is:

awk '++arr1[$0]>1{$2=$2"-"++arr[$2]}1' Input_file


awk '
'  Input_file

Explanation: Adding detailed explanation for above awk code.

awk '                               ##Starting awk program from here.
++arr1[$0]>1{                       ##Checking condition if current line occurrence in arr1 is greater than 1
  $2=$2"-"++arr[$2]                 ##Then add values to $2 as per condition. If $2 occurrence in arr is more than 1 then add - followed by its occurrence.
1                                   ##1 will print edited/non-edited line.
' Input_file                        ##Mentioning Input_file name here.

Leave a Reply