I'm writing a shell script to generate a directory listing.
as an input a receive a long html string :
https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw","$type":"com.traver.voyager.feed.actions.Action"}, link to post","url":"https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO","$type": article","$type":"com.traver.voyager.feed.actions.Action"},{"actionType":"SHARE_VIA","text":"Copy link to post","url":"https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T","$type":"com.traver.voyager
To make the output easily customizable, the script just display a url-table :
https://www.mycompany.com/posts/aureliaflore_china-seoul-startup-activity-6571925510337728512-acAw https://www.mycompany.com/posts/aureliaflore_reuters-top-news-on-twitter-activity-6571392661482233856-T3dO https://www.mycompany.com/posts/aureliaflore_are-you-thinking-to-the-benefits-of-digitalization-activity-6570119712154451968-927T
the pattern to search is : begins by "https://www." then XXXXX letters (dynamic size) then finishes with " (quote not to extract)
My current solution was based on cut -f but the total input size is dynamic, so it is not possible to find the pattern.
jq
or a json-parsing library for the language of your choice. If it's HTML, you can extract links from it easily withlynx -dump -listonly -nonumbers "$URL"
(lynx can also read from a file or from stdin).