How to extract particular url from HTML tags using UNIX commands

Question

I want to extract a link http://www.rediff.com/news from the below <a> tag

<a href="http://www.rediff.com/news" onclick="trackURL('http://track.rediff.com/click?url=___http://www.rediff.com/news___&cmp=news1_nav&lnk=news1_nav&nsrv1=ushome');return false;"><div class="n_tabnormal">News</div></a>

using some Unix command. Please No hard coding for this.

RomanPerekhrest · Accepted Answer · 2017-06-02 10:37:27Z

Using XML/HTML parsers is a right way to manipulate XML/HTML data:

xmlstarlet solution:

sed 's/&/&amp;/g' yourfile | xmlstarlet sel -t -v '//a[div/text() = "News"]/@href' -n

The output:

http://www.rediff.com/news

sed 's/&/&/g' - to convert ampersand & as a special char into HTML entity
//a[div/text() = "News"]/@href - xpath expression, extracts href attribute value of a tag if it has child node div with text News

Stack Exchange Network

How to extract particular url from HTML tags using UNIX commands

1 Answer 1

You must log in to answer this question.

Hot Network Questions

How to extract particular url from HTML tags using UNIX commands

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions