Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I extract specific lines based on a comparison of two files with sed and/or awk?

I need to extract all the lines from file2.txt that do not match the string until the first dot in any line in file1.txt. I am interested in a solution that stays as close to my current approach as possible so it is easy for me to understand, and uses only sed and/or awk in linux bash.

file1.h

apple.sweet banana
apple.tasty banana
apple.brown banana
orange_mvp.rainy day.here
orange_mvp.ear nose.png
lemon_mvp.ear ring
tarte_mvp_rainy day.here

file2.h

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

orange_mvp
lemon_mvp
lemon_mvp
tarte_mvp
cake_mvp

result desired

tarte_mvp
cake_mvp

current wrong approach

$ awk '
    NR==FNR { sub(/mvp(\..*)$/,""); a[$0]; next }
            { f=$0; sub(/mvp(\..*)$/,"", f) }
    !(f in a)
' file2.h file1.h

apple.sweet banana
apple.tasty banana
apple.brown banana
orange_mvp.rainy day.here
orange_mvp.ear nose.png
lemon_mvp.ear ring
tarte_mvp_rainy day.here

>Solution :

Using awk

$ awk -F. 'NR==FNR {a[$1]=$1;next} a[$1] != $0' file1.h file2.h
tarte_mvp
cake_mvp
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading