Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Why do I get blanks on some rows when using shell script to replace in file with lookup in second file?

I have one file data.txt with content

2013-04-24;1;0.1635;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2013-04-24;1;0.9135;1.4135;fp.3.Luci_01A01.ctg.ctg7180000038386
2017-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2011-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546
2012-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085549
2016-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085549
2016-04-24;2;0.9135;1.4335;fp.3.Luci_02C06.ctg.ctg7180000085549
2013-04-24;1;0.9135;1.4135;fp.3.Luci_01A01.ctg.ctg7180000038386
2011-04-24;2;0.9135;1.4135;fp.3.Luci_02C06.ctg.ctg7180000085546

and another file lookup.txt with content

1;2012-04-24;2ab1e4c0-de4d-11e2-a934-0f0479162b1b;fp.3.Luci_02C06.ctg.ctg7180000085546
7;2013-04-24;2ab21e90-de4d-11e2-9ce8-d368d9512bad;fp.3.Luci_01A01.ctg.ctg7180000038386
3;2014-04-24;2ab2582e-de4d-11e2-bb5f-6b1f6c4437f8;fp.3.Luci_02C06.ctg.ctg7180000085549

I want to replace column 5 in data.txt with column 1 in lookup.txt based on matching values in column 4 in lookup.txt. The result I want is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

2013-04-24;1;0.1635;1.4135;1
2013-04-24;1;0.9135;1.4135;7
2017-04-24;2;0.9135;1.4135;1
2011-04-24;2;0.9135;1.4135;1
2012-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4335;3
2013-04-24;1;0.9135;1.4135;7
2011-04-24;2;0.9135;1.4135;1

I another post I found the following shell script which looks promising

awk -F';' 'NR==FNR{a[$4]=$1;next}{$5=a[$5]}1' lookup.txt data.txt

But the result below leaves for some reason column 5 blank for row 5,6,7 and 9. Why? The fact that we get space separators rather than ";" is also an issue but not as important

2013-04-24 1 0.1635 1.4135 1
2013-04-24 1 0.9135 1.4135 7
2017-04-24 2 0.9135 1.4135 1
2011-04-24 2 0.9135 1.4135 1
2012-04-24 2 0.9135 1.4135
2016-04-24 2 0.9135 1.4135
2016-04-24 2 0.9135 1.4335
2013-04-24 1 0.9135 1.4135 7
2011-04-24 2 0.9135 1.4135

>Solution :

You may use this awk solution:

awk '
BEGIN{FS=OFS=";"}
{
   sub(/\r$/, "")
} 
NR == FNR {
   map[$NF] = $1
   next
}
{
   $NF = map[$NF]
} 1' lookup.txt data.txt

2013-04-24;1;0.1635;1.4135;1
2013-04-24;1;0.9135;1.4135;7
2017-04-24;2;0.9135;1.4135;1
2011-04-24;2;0.9135;1.4135;1
2012-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4135;3
2016-04-24;2;0.9135;1.4335;3
2013-04-24;1;0.9135;1.4135;7
2011-04-24;2;0.9135;1.4135;1

Explanation:

  • BEGIN{FS=OFS=";"}: Sets input and output field delimiters to ;
  • Use NF consistently instead of numbered column in both files
  • sub(/\r$/, "") is to remove any carriage characters from line end
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading