I have a headerless CSV file that is sorted on the first column. When the 2nd and 3rd columns are identical, I want to "collapse" them into one – i.e. replace the last column with a comma, which would combine with the first comma to make ,,, indicating the third column is empty. In other words, this:
0000001,11111,66666
0000002,12121,22222
0000003,33333,33333
0000004,74747,44444
0000005,12345,12345
…becomes this:
0000001,11111,66666
0000002,12121,22222
0000003,33333,,
0000004,74747,44444
0000005,12345,,
I’ve tried various permutations of grep and cut but can’t get anything to work – the closest I’ve come is cut -c 8-19 file.csv, which just isolates the 2nd and 3rd columns. I have a feeling needing to do this across columns and needing to replace the value instead of just delete the whole line makes this complicated enough to require awk or sed, and I don’t know enough about either to know how to approach this.
>Solution :
Using sed
$ sed -E 's/([^,]*,([^,]*),)\2/\1,/' input_file
0000001,11111,66666
0000002,12121,22222
0000003,33333,,
0000004,74747,44444
0000005,12345,,