Creating a unique array in awk: can this snippet be elaborated?

March 8, 2022

Thanks to @EdMorton, I can unique an array in awk this way:

BEGIN {
    # create an array 
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        if ( !seen[array[i]]++ ) {
            unique[++j] = array[i]
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
    # results in:
    # a
    # b
    # c
    # d
    # e
}

What I don’t understand, though, is this ( !seen[array[i]]++ ) condition with an increment:

I do understand that we collect unique indices in the seen array;
So, we check if our temp array seen already has an index array[i] (and add it to unique, if it hasn’t);
But the increment after the index is the thing I still can’t get 🙂 (despite the detailed explanation provided by Ed).

So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it 🙂

>Solution :

Hope this is clearer but idk – best I can say is it’s more elaborate as requested!

$ cat tst.awk
BEGIN {
    # create an array
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        val = array[i]
        count[val] = count[val] + 1

        if ( count[val] == 1 ) {
            is_first_time_val_seen = 1
        }
        else {
            is_first_time_val_seen = 0
        }

        if ( is_first_time_val_seen ) {
            unique[++j] = val
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
}