-4

Text in input file is like this

<title> <band height="21" isSplitAllowed="true" > <staticText> <reportElement x="1" y="1" width="313" height="20" key="staticText-1"/> <box></box> <textElement> <font fontName="Arial" pdfFontName="Helvetica-Bold" size="14" isBold="true" isUnderline="true"/> </textElement> <text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text> </staticText> </band> </title> 

Output file should have:

4) Computation of Tier I and Tier II Capital : 

The file has many <title> and [CDATA] tags. but I want to copy text which is under tag <title> under <CDATA> and save its output in another file.

3
  • Using what? Bash? Not likely to survive minor formatting changes in the file. SMOP in Python...
    – xenoid
    CommentedJan 29, 2019 at 9:31
  • 4
    grep '4) Computation of Tier I and Tier II Capital :' input.txt > output.txt :/ You'll have to give us more specific details about what strings are allowed, and what are not. Perhaps give us an example of what is not allowed, and a few that are allowed.
    – Sparhawk
    CommentedJan 29, 2019 at 9:32
  • if required string is always within ** then try cat file | grep -rin \* | cut -d \* -f 3CommentedJan 29, 2019 at 9:57

2 Answers 2

2

It looks like you may have tried to put a pair of ** sequences into your CDATA section to highlight it here. Unfortunately that has turned it into invalid XML. Assuming you meant this instead,

<text><![CDATA[4) Computation of Tier I and Tier II Capital :]]></text> 

you can use an XML parser to parse your XML:

xmlstarlet sel -T -t -v '//text' -n x.xml 4) Computation of Tier I and Tier II Capital : 

If you have a tighter constraint than just "the contents of the <text/> element" you can adjust the XPath filter appropriately. For example:

xmlstarlet sel -T -t -v '/title/band/staticText/text' -n x.xml 4) Computation of Tier I and Tier II Capital : 
3
  • xmlstarlet is now working in my unix machineCommentedJan 30, 2019 at 5:28
  • Great stuff. So you're sorted then?CommentedJan 30, 2019 at 7:38
  • 2
    @AnkitaJain Good! If this solves your issue, please consider accepting the answer.
    – Kusalananda
    CommentedJan 30, 2019 at 12:11
0

Like this?

$ sed -n '/<title>/,/<\/title>/p' input.txt | grep -oP '(?<=\[CDATA\[).*(?=\])' 
  • sed will print everything between the <title> and </title> (and include this tags). If your [CDATA is always just in this area you can omit this step
  • grep will print out everything what is preceded by [CDATA[ and followed by ]
2
  • ** with CDATA is wrongly put. correct line is: [CDATA[4) Computation of Tier I and Tier II Capital :]]CommentedJan 30, 2019 at 5:29
  • 1
    Then just remove the **. I've edited my answer.CommentedJan 30, 2019 at 9:43

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.