Finding specific string in XML file and storing in another file [closed]

Question

Text in input file is like this

<title> <band height="21" isSplitAllowed="true" > <staticText> <reportElement x="1" y="1" width="313" height="20" key="staticText-1"/> <box></box> <textElement> <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/> </textElement> <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text> </staticText> </band> </title>

Output file should have:

4) Computation of Tier I and Tier II Capital :

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python... — xenoid, CommentedJan 29, 2019 at 9:31
grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed. — Sparhawk, CommentedJan 29, 2019 at 9:32
if required string is always within ** then try cat file | grep -rin \* | cut -d \* -f 3 — rajaganesh87, CommentedJan 29, 2019 at 9:57

Chris Davies · Accepted Answer · 2019-01-29 14:05:12Z

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text>

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml 4) Computation of Tier I and Tier II Capital :

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml 4) Computation of Tier I and Tier II Capital :

@AnkitaJain Good! If this solves your issue, please consider accepting the answer. — Kusalananda, CommentedJan 30, 2019 at 12:11

finswimmer · Accepted Answer · 2019-01-30 09:42:32Z

Like this?

$ sed -n '/<title>/,/<\/title>/p' input.txt | grep -oP '(?<=\[CDATA\[).*(?=\])'

sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step
grep will print out everything what is preceded by [CDATA[ and followed by ]

** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]] — Ankita Jain, CommentedJan 30, 2019 at 5:29

Stack Exchange Network

Finding specific string in XML file and storing in another file [closed]

2 Answers 2

Hot Network Questions

Finding specific string in XML file and storing in another file [closed]

2 Answers 2

Related

Hot Network Questions