0

How do I parse a XML with below contents?

<?xml version="1.0"?> <saw:ibot xmlns:saw="com.siebel.analytics.web/report/v1" version="1" priority="normal" jobID="36 "> <saw:schedule timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)" disabled="false"> <saw:start repeatMinuteInterval="60" endTime="23:59:00" startImmediately="true"/> <saw:recurrence runOnce="false"> <saw:weekly weekInterval="1" mon="true" tue="true" wed="true" thu="true" fri="true"/> </saw:recurrence> </saw:schedule> <saw:dataVisibility type="recipient" runAs="cgm"/> <saw:choose> <saw:when condition="true"> <saw:deliveryContent> <saw:headline> <saw:caption> <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arriv al_Days})</saw:text> </saw:caption> </saw:headline> <saw:conditionalReport/> </saw:deliveryContent> <saw:postActions/> </saw:when> ...skipping... al_Days})</saw:text> </saw:caption> </saw:headline> <saw:conditionalReport/> </saw:deliveryContent> <saw:postActions/> </saw:when> <saw:otherwise/> </saw:choose> <saw:deliveryDestinations> <saw:destination category="dashboard"/> <saw:destination category="activeDeliveryProfile"/> </saw:deliveryDestinations> <saw:recipients subscribers="true" customize="false" specificRecipients="false"> <saw:subscribers> <saw:user name="[email protected]"/> <saw:user name="[email protected]"/> <saw:user name="[email protected]"/> </saw:subscribers> </saw:recipients> <saw:conditionQuery> <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"/> </saw:conditionQuery> </saw:ibot> 

and retrieve below output?

[email protected][email protected][email protected]

Also I have 5 .xml file with different set of parsing name value. Anyway we can parse and merge them in command line and output in one file ?

I have tried sed and awk options but not helping me much to get desired output.

7
  • 5
    1. Don't parse XML with sed or awk. 2. We can't provide you examples of code to run without seeing the XML that contains the data you want to retrieve. 3. Don't parse XML with sed or awk. 4. Please update your question to provide a minimal example XML file. 5. Don't parse XML with sed or awk.CommentedJul 17, 2015 at 22:38
  • I've formatted your question and the XML is now visible. Unfortunately your example is not a valid XML document.CommentedJul 17, 2015 at 22:40
  • You need to format the content. In this case that means using the {} marker to indent the content by four spaces. I'll do it for you once again...CommentedJul 17, 2015 at 23:02
  • That's still not a valid XML document: /tmp/xml:33.18: Opening and ending tag mismatch: subscribers line 29 and recipients and other errorsCommentedJul 17, 2015 at 23:04
  • 2
    @G-Man I don't think it is a duplicate as this one is all about well formed XML document parsing, whereas your suggested duplicate needs different solutions due to the potential lack of well-formed-ness of html. I don't think it's off topic either fwiw.CommentedJul 18, 2015 at 6:57

2 Answers 2

4

This command will parse your XML document and use XPath to extract the name attribute values for the element at location /saw:ibot/saw:recipients/saw:subscribers/saw:user

xmlstarlet sel -t -v '/saw:ibot/saw:recipients/saw:subscribers/saw:user/@name' </tmp/xml 

Output

[email protected][email protected][email protected]
3
  • 1
    On a side note: people also seem to like xidel (site down for the moment, along with the rest of SourceForge).
    – lcd047
    CommentedJul 18, 2015 at 5:51
  • OK, if you say so. For me it doesn't, and it is hard to understand how it can work.
    – mzjn
    CommentedAug 10, 2015 at 9:47
  • 2
    @mzjn ah. The XML has changed shape. Again. When I answered the question my answer worked, but now it doesn't. If you follow the history through you'll see that as it was, it took several attempts to get the OP to provide a sample that was even valid XML, and none of those were well formatted for easy viewing. I'll update my answer once more.CommentedAug 10, 2015 at 17:09
1

Use an XML Parser. Personally - like XML::Twig and perl.

#!/usr/bin/env perl use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new( ); $twig->parsefile ( 'your_file.xml' ); foreach my $saw_user ( $twig->get_xpath('//saw:user') ) { print $saw_user ->att('name'), "\n"; } 

This prints:

[email protected][email protected][email protected]

If you want a 'one liner' then instead:

perl -MXML::Twig -0777 -e 'print map { $_ -> att('name')."\n"} ( XML::Twig->parse( <> )->get_xpath('//saw:user') )' your_xml_file 

Please for the sake of future maintenance programmers and sysadmins - DO NOT use regular expressions to parse XML. Why you may ask? Well, because taking your XML as an example - it can look like any of these and still be semantically identical:

(your example +

<?xml version="1.0" encoding="utf-8"?> <saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1"> <saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)"> <saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true" /> <saw:recurrence runOnce="false"> <saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1" /> </saw:recurrence> </saw:schedule> <saw:dataVisibility runAs="cgm" type="recipient" /> <saw:choose> <saw:when condition="true"> <saw:deliveryContent> <saw:headline> <saw:caption> <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text> </saw:caption> </saw:headline> <saw:conditionalReport/> </saw:deliveryContent> <saw:postActions/> </saw:when> <saw:otherwise/> </saw:choose> <saw:deliveryDestinations> <saw:destination category="dashboard" /> <saw:destination category="activeDeliveryProfile" /> </saw:deliveryDestinations> <saw:recipients customize="false" specificRecipients="false" subscribers="true"> <saw:subscribers> <saw:user name="[email protected]" /> <saw:user name="[email protected]" /> <saw:user name="[email protected]" /> </saw:subscribers> </saw:recipients> <saw:conditionQuery> <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content" /> </saw:conditionQuery> </saw:ibot> 

Or like this (note tag wrapping of elements)

<?xml version="1.0" encoding="utf-8"?> <saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1"> <saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)"> <saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true"/> <saw:recurrence runOnce="false"> <saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1"/> </saw:recurrence> </saw:schedule> <saw:dataVisibility runAs="cgm" type="recipient"/> <saw:choose> <saw:when condition="true"> <saw:deliveryContent> <saw:headline> <saw:caption> <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text> </saw:caption> </saw:headline> <saw:conditionalReport/> </saw:deliveryContent> <saw:postActions/> </saw:when> <saw:otherwise/> </saw:choose> <saw:deliveryDestinations> <saw:destination category="dashboard"/> <saw:destination category="activeDeliveryProfile"/> </saw:deliveryDestinations> <saw:recipients customize="false" specificRecipients="false" subscribers="true"> <saw:subscribers> <saw:user name="[email protected]"/> <saw:user name="[email protected]"/> <saw:user name="[email protected]"/> </saw:subscribers> </saw:recipients> <saw:conditionQuery> <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"/> </saw:conditionQuery> </saw:ibot> 

Or like this:

<?xml version="1.0" encoding="utf-8"?> <saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1" ><saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)" ><saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true" /><saw:recurrence runOnce="false" ><saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1" /></saw:recurrence></saw:schedule><saw:dataVisibility runAs="cgm" type="recipient" /><saw:choose ><saw:when condition="true" ><saw:deliveryContent ><saw:headline ><saw:caption ><saw:text >Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text></saw:caption></saw:headline><saw:conditionalReport /></saw:deliveryContent><saw:postActions /></saw:when><saw:otherwise /></saw:choose><saw:deliveryDestinations ><saw:destination category="dashboard" /><saw:destination category="activeDeliveryProfile" /></saw:deliveryDestinations><saw:recipients customize="false" specificRecipients="false" subscribers="true" ><saw:subscribers ><saw:user name="[email protected]" /><saw:user name="[email protected]" /><saw:user name="[email protected]" /></saw:subscribers></saw:recipients><saw:conditionQuery ><saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content" /></saw:conditionQuery></saw:ibot> 

Hopefully by looking at these samples, you'll see that by reformatting your XML in a PERFECTLY VALID fashion, your regex might one day break mysteriously.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.