I want to read a word between two xml elements using sed
command.
For e.g. in below xml, I want to read the number 1234567.
<ns1:account> <ns2:name>Corporation</ns2:name> <address> <StrtNm>NewYork</StrtNm> <BldgNb>3</BldgNb> <PstCd>230300</PstCd> <Ctry>USA</Ctry> </address> </ns1:account> <ns3:details> <ns4:accnum> <ns5:info> <nd6:accnum>1234567</nd6:accnum> </ns5:info> </ns4:accnum> </ns3:details>
I was able to do this using a combination of grep
and sed
commands as below,
grep -oz '<.*details>\s*<.*accnum>\s*<.*info>\s*<.*accnum>[0-9]*</.*accnum>' test.xml |sed -n 's:.*<.*accnum>\(.*\)</.*accnum>.*:\1:p'
but I read that grep -oz
is not good for performance since it treats the entire file as a single line. So I tried with two sed
commands but it only works if the file is properly formatted as the one shown above. It doesn't work if the xml comes as a single line without pretty printing. This is what I tried:
sed -n '/.*details>/,/<\/.*accnum>/p' test.xml |sed -n 's:.*<.*accnum>\(.*\)<.*accnum>:\1:p'
Challenges:
- The file can come with or without namespace prefixes in the elements.
- The file is pretty large, about 100Mb or more.
- The file contents can come as a properly formatted xml or as the entire xml as a single line.
I haven't tried awk
command yet since there are existing scripts in our application which use the commands listed above, and I was hoping to get the same working.
nd6:accnum
element?