Join string within loop using awk (involve replacing an array with another)

Advertisements

I’m writing a one-liner using awk to join strings within a loop.

Basically, there’s an array of strings (I call it a), then I have another copy of that array (I call it b which is cloned using the clone function). I concatenate the string from array a (which is m) and b (which is n) together, then I store the concatenated string (m n) in a temporary array (tmp). Next, I replace array b with array tmp, and repeat the concatenation. This time, array b is array tmp, so the concatenated string would be m n n… and so on. I put the concatenation in a loop (here, it’ll be repeated for three times) but I wasn’t able to print the result out after the loop was done.

awk -v k=3 'function clone(original, copy) {for (i in original) {if (isarray(original[i])) {copy[i][1]=""; delete copy[i][1]; clone(original[i], copy[i])} else {copy[i]=original[i]}}} BEGIN {a["A"]; a["T"]; a["G"]; a["C"]; clone(a, b); for (i=1; i<k; i++) {for (m in a) {for (n in b) {tmp[m n]}}; delete b; clone(tmp, b)}; for (i in b) {print i}}'

I was also able to do this in Perl and Python. And below is how I did it.

Equivalent function in Perl:

sub kmer_generator {
    my ($k) = @_;
    my @bases_1 = ("A", "T", "G", "C");
    my @bases_2 = @bases_1;
    for (my $i = 1; $i < $k; $i++) {
        my @temporary;
        for my $base_1 (@bases_1) {
            for my $base_2 (@bases_2) {
                push @temporary, "$base_1" . "$base_2";
            };
        };
        undef @bases_2;
        @bases_2 = @temporary;
    };
    return @bases_2;
};

Equivalent function in Python:

def generate_kmer(k):
    bases_1 = ["A", "T", "G", "C"]
    bases_2 = bases_1.copy()
    i = 0
    while i < k - 1:
        i += 1
        temp = []
        for m in bases_1:
            for n in bases_2:
                temp.append(m + n)
        bases_2 = None
        bases_2 = temp
    return bases_2

>Solution :

Hah, that was fun. The following script

#!/bin/bash

awk '
# https://unix.stackexchange.com/a/456316/209955
function clone(lhs, rhs) {
    for (i in rhs) {
        if (isarray(rhs[i])) {
            lhs[i][1] = ""
            delete lhs[i][1]
            clone(lhs[i], rhs[i])
        } else {
            lhs[i] = rhs[i]
        }
    }
}
# documentation...
# bases_2 is the return array
# k is an int
function generate_kmer(bases_2, k,
       # Local function variables, to preserve locality add them as arguments.
       bases_1, i, temp) {
    # Easy array initialization.
    split("A T G C", bases_1, " ")
    clone(bases_2, bases_1)
    # ugh?? I would do for (i = 1; i < k; i++) {
    i = 0
    while (i < k - 1) {
        i += 1
        temp = ""
        for (m in bases_1) {
            for (n in bases_2) {
                # I used just a string separated with spaces, easier to append.
                temp = temp (temp ? " " : "") bases_1[m] bases_2[n]
            }
        }
        split(temp, bases_2, " ")
    }
}
END {
    generate_kmer(output, 2)
    for (i in output) {
        printf("%s%s", output[i], i == length(output) ? "" : " ")
    }
    printf("\n")
}' </dev/null

python -c '
def generate_kmer(k):
    bases_1 = ["A", "T", "G", "C"]
    bases_2 = bases_1.copy()
    i = 0
    while i < k - 1:
        i += 1
        temp = []
        for m in bases_1:
            for n in bases_2:
                temp.append(m + n)
        bases_2 = temp
    return bases_2
print(" ".join(generate_kmer(2)))
'

outputs:

AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC
AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC

Leave a ReplyCancel reply