Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating a unique array in awk: can this snippet be elaborated?

Thanks to @EdMorton, I can unique an array in awk this way:

BEGIN {
    # create an array 
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        if ( !seen[array[i]]++ ) {
            unique[++j] = array[i]
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
    # results in:
    # a
    # b
    # c
    # d
    # e
}

What I don’t understand, though, is this ( !seen[array[i]]++ ) condition with an increment:

  1. I do understand that we collect unique indices in the seen array;
  2. So, we check if our temp array seen already has an index array[i] (and add it to unique, if it hasn’t);
  3. But the increment after the index is the thing I still can’t get 🙂 (despite the detailed explanation provided by Ed).

So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it 🙂

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Hope this is clearer but idk – best I can say is it’s more elaborate as requested!

$ cat tst.awk
BEGIN {
    # create an array
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        val = array[i]
        count[val] = count[val] + 1

        if ( count[val] == 1 ) {
            is_first_time_val_seen = 1
        }
        else {
            is_first_time_val_seen = 0
        }

        if ( is_first_time_val_seen ) {
            unique[++j] = val
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
}

$ awk -f tst.awk
a
b
c
d
e
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading