To make this problem simple to demonstrate, I made a fake xml file like this.
<abc>
<spirit:addressBlock>
<spirit:name>cmn700_registers</spirit:name>
<def>
</def>
</spirit:addressBlock>
</abc>
And I want to print lines containing pattern <spirit:name> inside a block of lines, the block begining with the pattern <spirit:addressBlock> and ending with </spirit:addressBlock>. I defined a function in .bash_aliase like this.
function SearchPatInBlk {
awk "/$1/{inblk=1} inblk==1&&/$2/{inblk=0} inblk==1&&/$3/{print \$0}" $4
}
So the first argument and second argument is the block start and end pattern, third argument is the pattern I want to print the line with and the fourth argument is the xml filename. And then I gave this command at the bash shell.
SearchPatInBlk <spirit:addressBlock> </spirit:addressBlock> <spirit:name> ../../ab21/ab21_cmn700_new10_clst/build/ab21_cmn700/logical/cmn700/ipxact/cmn700_ab21.xml
Of course this gives me an error.
bash: syntax error near unexpected token `<'
So I tried putting some escape characters (\) before <,>,/ but it doesn’t work. How should I do it?
>Solution :
Using a true XML parser would be better than a general purpose text processor like awk. But if you absolutely need awk there are several things to fix.
- Properly escape regex operators in your pattern strings.
- Pass your pattern strings to
awkasawkvariables, not as parts of theawkscript. - Use the
regex,regexawkrange pattern.
Optionally you could also use more accurate regex and, if your awk is GNU awk, mark the patterns as regex constants (@/.../):
function SearchPatInBlk {
awk -v v1="$1" -v v2="$2" -v v3="$3" 'v1,v3 {if($0 ~ v3) print}' "$4"
}
SearchPatInBlk '@/^[[:space:]]*[<]spirit:addressBlock[>][[:space:]]*$/' \
'@/^[[:space:]]*[<][/]spirit:addressBlock[>][[:space:]]*$/' \
'@/[<]spirit:name[>]' file