Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use the concatenation of field values as the array subscript, where the fields are specified on the command line

I am creating an AWK program that I named join. It will be used to join two files via a composite key.

The program is run by passing it two files, file1 and file2, and two variables, flcols ("file1 columns") and f2cols ("file2 columns"). The value of flcols and f2cols is a comma-separated list of numbers. The numbers identify fields, e.g., f1cols='1,2,3,4' means fields $1, $2, $3, $4 in file1. Here are a couple examples of invoking the program:

join -v f1cols='1,2,3,4' -v f2cols='2,3,4,5' file2 file1
join -v f1cols='1,3,5' -v f2cols='1,2,3' file2 file1

I want to store the content of file2 in an array named a. The subscripts of a are to be the concatenation of the values of the fields identified by f2cols. So, if the program is invoked like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

join -v f1cols='1,2,3,4' -v f2cols='2,3,4,5' file2 file1

then the subscript should be:

a[$2,$3,$4,$5]

If the program is invoked like this:

join -v f1cols='1,3,5' -v f2cols='1,2,3' file2 file1

then the subscript should be:

a[$1,$2,$3]

To generalize the problem statement:

Given this command-line argument:

f2cols='x1,x2,...,xn'

where xi is a non-negative integer.

In the AWK program create a subscript:

a[$x1,$x2,...,$xn]

The subscript is the string resulting from concatenating the values of fields $x1,$x2,…,$xn.

I have no idea how to create such subscripts. A little help please.

>Solution :

Sample inputs:

$ head file1 file2
==> file1 <==
as,df,as,df,sd,f
1,a,2,b,3,4,5,6,7,8,9
x,xxx,y,yyy,z,a,b,c

==> file2 <==
a,b,c,d,e,f
g,h,j,k,l,m
1,2,3,4,5,6
x,y,z,a,b,c

One awk approach:

awk -v f1cols='1,3,5' -v f2cols='1,2,3' '
BEGIN   { FS=OFS=","
          m=split(f1cols,f1,",")
          n=split(f2cols,f2,",")
        }

FNR==NR { idx=$(f2[1])
          for (i=2;i<=n;i++)
              idx=idx FS $(f2[i])
          arr[idx]=$0
          next
        }

        { idx=$(f1[1])
          for (i=2;i<=m;i++)
              idx=idx FS $(f1[i])
          if (idx in arr)
             print "found index: " idx
        }
' file2 file1

This generates:

found index: 1,2,3
found index: x,y,z
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading