1

I need to fetch data for 2 tags "estimated" and "fullSign" for all the occurences on this result set.

RESULT SET: <?xml version="1.0" encoding="UTF-8"?> <resultSet xmlns="urn:trimet:arrivals" queryTime="1469138325745"><location desc="Morrison/SW 3rd Ave MAX Station" dir="Westbound" lat="45.5181811277907" lng="-122.675385866199" locid="8381" /><arrival block="9007" departed="true" dir="1" status="estimated" estimated="1469138452000" fullSign="MAX Blue Line to Hillsboro" piece="1" route="100" scheduled="1469138250000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="1901" at="1469138300978" heading="201" lat="45.5214364" lng="-122.6716177"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557314" destDist="77046" pattern="54" progress="75145" /></blockPosition></arrival><arrival block="9050" departed="true" dir="1" status="estimated" estimated="1469138664000" fullSign="MAX Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469138670000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="4552" at="1469138313683" heading="237" lat="45.5277621" lng="-122.6687878"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556307" destDist="66321" pattern="15" progress="61769" /></blockPosition></arrival><arrival block="9018" departed="true" dir="1" status="estimated" estimated="1469139140000" fullSign="MAX Blue Line to Hillsboro" piece="1" route="100" scheduled="1469139150000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="13687" at="1469138320005" heading="239" lat="45.5309688" lng="-122.6350333"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557315" destDist="77046" pattern="54" progress="63359" /></blockPosition></arrival><arrival block="9043" departed="true" dir="1" status="estimated" estimated="1469139577000" fullSign="MAX Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469139570000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="31909" at="1469138310486" heading="285" lat="45.5320383" lng="-122.5738342"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556308" destDist="66321" pattern="15" progress="34412" /></blockPosition></arrival></resultSet> 

expected result:

1469138452000 MAX Blue Line to Hillsboro 1469138664000 MAX Red Line to City Center &amp; Beaverton 1469139140000 MAX Blue Line to Hillsboro 1469139577000 MAX Red Line to City Center &amp;Beaverton 

What is a good way for me to extract this data?

5
  • 3
    Start by searching for xmlstarlet here on U&LCommentedJul 21, 2016 at 23:38
  • Thanks @Roaima, Tried using xmlstarlet, but maybe regex is incorrect, still unable to fetch the estimated and fullSign values. '/usr/bin/xmlstarlet sel -t -v "/arrival/@estimated" -nl filename.xml'
    – Sunnx
    CommentedJul 22, 2016 at 18:22
  • 1
    when you post XML, please try to make it readable with xmltidy or xml_pp or xmlstarlet fo or one of many other similar tools.
    – cas
    CommentedJul 23, 2016 at 6:41
  • for a simple extraction, I'd use xml2 to convert to a line-oriented format so I could use awk or perl or other standard line-oriented text utilities.
    – cas
    CommentedJul 23, 2016 at 6:42
  • Sure, will follow from next time
    – Sunnx
    CommentedJul 25, 2016 at 20:56

2 Answers 2

2

This is using XMLstarlet with paste. It can probably be made in a single call to XMLstarlet, but I'm no wizard:

$ paste <(xml sel -T -t -v '//@estimated' data.xml) \ <(xml sel -T -t -v '//@fullSign' data.xml) 1469138452000 MAX Blue Line to Hillsboro 1469138664000 MAX Red Line to City Center & Beaverton 1469139140000 MAX Blue Line to Hillsboro 1469139577000 MAX Red Line to City Center & Beaverton 
0
    1
    $ xml2 < sunnx.xml | awk -F= ' $1 ~ /@fullSign/ { fs=$2 ; sub(/&/,"&amp;",fs) }; $1 ~ /@estimated/ { est=$2 }; fs && est { printf "%s %s\n", est, fs; fs=est="" }' 
    1469138452000 MAX Blue Line to Hillsboro 1469138664000 MAX Red Line to City Center &amp; Beaverton 1469139140000 MAX Blue Line to Hillsboro 1469139577000 MAX Red Line to City Center &amp; Beaverton 

    If you want a literal & rather than &amp;, then get rid of the sub() function call. xml2 decodes the encoded entities for you, so I added the sub() to change it back to conform to your requested output.

    Without the sub(), the output looks like this:

    1469138452000 MAX Blue Line to Hillsboro 1469138664000 MAX Red Line to City Center & Beaverton 1469139140000 MAX Blue Line to Hillsboro 1469139577000 MAX Red Line to City Center & Beaverton 
    0

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.