0

I want to extract a link http://www.rediff.com/news from the below <a> tag

<a href="http://www.rediff.com/news" onclick="trackURL('http://track.rediff.com/click?url=___http://www.rediff.com/news___&cmp=news1_nav&lnk=news1_nav&nsrv1=ushome');return false;"><div class="n_tabnormal">News</div></a>

using some Unix command. Please No hard coding for this.

    1 Answer 1

    1

    Using XML/HTML parsers is a right way to manipulate XML/HTML data:

    xmlstarlet solution:

    sed 's/&/&amp;/g' yourfile | xmlstarlet sel -t -v '//a[div/text() = "News"]/@href' -n 

    The output:

    http://www.rediff.com/news 

    • sed 's/&/&amp;/g' - to convert ampersand & as a special char into HTML entity

    • //a[div/text() = "News"]/@href - xpath expression, extracts href attribute value of a tag if it has child node div with text News

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.