I have a file with four columns:
text1 a1 a2 5
text2 b2 b8 10
text3 b9 b4 15
text3 b9 b4 25
text3 b9 b4 20
text4 h1 g8 50
text4 g1 k5 70
text4 g1 k5 80
text4 g1 k5 50
text5 y5 p3 25
I wanted the following result:
text1 a1 a2 5
text2 b2 b8 10
text3 b9 b4 25
text4 h1 g8 50
text4 g1 k5 80
text5 y5 p3 25
Remove duplicate value from rows that match:
The first, second and third columns are the same and in the fourth column take the highest value.
I tried it as follows:
awk '!x[$1]++' file.txt
>Solution :
You are only indexing on $1 but your question requires the key to be $1..$3, and obviously your attempt does nothing to pick the maximum value instead of the first value for that key.
If the values for a key are always adjacent, you can collect them until you reach the next key, and then print that with the maximum value.
awk 'k != $1 "_" $2 "_" $3 {
if(NR > 1) print v;
k=$1 "_" $2 "_" $3; s = $4; v = $0; next }
$4 > s { s = $4; v = $0 }
END { print v }' file.txt
We collect the first three columns in k and the maximum value for this key in s. The entire line which contained the maximum value is v so that we don’t have to assemble the key and the value back for printing it. The script generally prints a line for the previous key when it finds a new key, but then of course we also need to do that when we fall off the end of the file, so we do that in the END block.
If adjacent cells can’t be guaranteed, sorting the file and piping to Awk is probably easier than writing a better script, especially if you haven’t learned any Awk at all yet. (Though do spend an hour or two; it’s a good use of your time.)