Thanks to @EdMorton, I can unique an array in awk this way:
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
if ( !seen[array[i]]++ ) {
unique[++j] = array[i]
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
# results in:
# a
# b
# c
# d
# e
}
What I don’t understand, though, is this ( !seen[array[i]]++ ) condition with an increment:
- I do understand that we collect unique indices in the
seenarray; - So, we check if our temp array
seenalready has an indexarray[i](and add it to unique, if it hasn’t); - But the increment after the index is the thing I still can’t get 🙂 (despite the detailed explanation provided by Ed).
So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it 🙂
>Solution :
Hope this is clearer but idk – best I can say is it’s more elaborate as requested!
$ cat tst.awk
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
val = array[i]
count[val] = count[val] + 1
if ( count[val] == 1 ) {
is_first_time_val_seen = 1
}
else {
is_first_time_val_seen = 0
}
if ( is_first_time_val_seen ) {
unique[++j] = val
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
}
$ awk -f tst.awk
a
b
c
d
e