I’m writing a one-liner using awk
to join strings within a loop.
Basically, there’s an array of strings (I call it a
), then I have another copy of that array (I call it b
which is cloned using the clone
function). I concatenate the string from array a
(which is m
) and b
(which is n
) together, then I store the concatenated string (m n
) in a temporary array (tmp
). Next, I replace array b
with array tmp
, and repeat the concatenation. This time, array b
is array tmp
, so the concatenated string would be m n n
… and so on. I put the concatenation in a loop (here, it’ll be repeated for three times) but I wasn’t able to print the result out after the loop was done.
awk -v k=3 'function clone(original, copy) {for (i in original) {if (isarray(original[i])) {copy[i][1]=""; delete copy[i][1]; clone(original[i], copy[i])} else {copy[i]=original[i]}}} BEGIN {a["A"]; a["T"]; a["G"]; a["C"]; clone(a, b); for (i=1; i<k; i++) {for (m in a) {for (n in b) {tmp[m n]}}; delete b; clone(tmp, b)}; for (i in b) {print i}}'
I was also able to do this in Perl and Python. And below is how I did it.
Equivalent function in Perl:
sub kmer_generator {
my ($k) = @_;
my @bases_1 = ("A", "T", "G", "C");
my @bases_2 = @bases_1;
for (my $i = 1; $i < $k; $i++) {
my @temporary;
for my $base_1 (@bases_1) {
for my $base_2 (@bases_2) {
push @temporary, "$base_1" . "$base_2";
};
};
undef @bases_2;
@bases_2 = @temporary;
};
return @bases_2;
};
Equivalent function in Python:
def generate_kmer(k):
bases_1 = ["A", "T", "G", "C"]
bases_2 = bases_1.copy()
i = 0
while i < k - 1:
i += 1
temp = []
for m in bases_1:
for n in bases_2:
temp.append(m + n)
bases_2 = None
bases_2 = temp
return bases_2
>Solution :
Hah, that was fun. The following script
#!/bin/bash
awk '
# https://unix.stackexchange.com/a/456316/209955
function clone(lhs, rhs) {
for (i in rhs) {
if (isarray(rhs[i])) {
lhs[i][1] = ""
delete lhs[i][1]
clone(lhs[i], rhs[i])
} else {
lhs[i] = rhs[i]
}
}
}
# documentation...
# bases_2 is the return array
# k is an int
function generate_kmer(bases_2, k,
# Local function variables, to preserve locality add them as arguments.
bases_1, i, temp) {
# Easy array initialization.
split("A T G C", bases_1, " ")
clone(bases_2, bases_1)
# ugh?? I would do for (i = 1; i < k; i++) {
i = 0
while (i < k - 1) {
i += 1
temp = ""
for (m in bases_1) {
for (n in bases_2) {
# I used just a string separated with spaces, easier to append.
temp = temp (temp ? " " : "") bases_1[m] bases_2[n]
}
}
split(temp, bases_2, " ")
}
}
END {
generate_kmer(output, 2)
for (i in output) {
printf("%s%s", output[i], i == length(output) ? "" : " ")
}
printf("\n")
}' </dev/null
python -c '
def generate_kmer(k):
bases_1 = ["A", "T", "G", "C"]
bases_2 = bases_1.copy()
i = 0
while i < k - 1:
i += 1
temp = []
for m in bases_1:
for n in bases_2:
temp.append(m + n)
bases_2 = temp
return bases_2
print(" ".join(generate_kmer(2)))
'
outputs:
AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC
AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC