Extracting certain words from a text file using tr and awk

October 31, 2022

this has been giving me a lot of trouble

URL: http://123.123.123.123
file: php
124.124.124.124|user1|email|phone

URL: http://1.2.3.4
file: php
2.1.3.1|userx|emailx|phonex

and the file contains more sets of data just like this one

i used

grep http -A 3|tr '\n' ' '|tr '|' ' '|awk '{print $2,$7,$8}'|tr ' ' ':'

the outcome is only from the first set of data

123.123.123.123:email:phone

intended outcome

123.123.123.123:email:phone
1.2.3.4:emailx:phonex

>Solution :

If you are using Awk anyway, you can get rid of grep and tr.

If you can rely on the empty line to separate arguments, try RS='\n\n'. Here’s a refactoring which instead extracts the information from the third line after the hit.

awk '/http/ { l=2; ip=$0; sub(/.*\/\//, "", ip); next }
l && --l == 0 { tail=$0; sub(/^[^|]*[|][^|]*[|]/, "", tail);
    sub(/[|]/, ":", tail); print ip ":" tail }'

Perhaps /^URL:/ would be a better regex than /http/ for finding the beginning of a record.