0

My question is not related to the Parse XML to get node value in bash script? Also i cannot install/use any new XML parser as per the company policy. This needs to be achieved using shell/perl/awk/sed

I will try to rephrase my question:

1) We have a process.log file in which we have lot of text data and in between we have some XML data published.
2) There of thousands of different XML published in the logs along with other text data.
3) Now i need to select only the XML files which are published after Outgoing XML: value
4) Also the XML file which must be selected and copied to a new file should be the one which matches the value in the ALERTID tag.
5) The ALERTID value will be provided in the script input. So in our case mGMjhgHgffHhhFdH1u4 will be provided in the input and we need to select the full XML file published for this alertid. Starting tag is from and ending tag is
5) So i need to select the relevant Outgoing XML file in a new file based on a particular ALERTID so it can be replayed in different environments.

Format of the log:

Info Jan 11 17:30:26.12122 The process is not responding to heartbeats Debug Jan 11 17:30:26.12123 Incoming XML :<xml version "1.0" encoding ="UTF-8"?> <Alert trigger = "true" > <Alerttype>orderReject</Alerttype> <AlertID>ghghfsjUtYuu78T1</AlertID> <Order>uusingas</Order> <Quantity>1254</Quanity> </Alert> (CreateInitEventHandler. C:356) Debug Jan 11 17:30:26.12199 The process is going down with warnings Debug Jan 11 17:30:26.148199 Outgoing XML: <xml version "1.0" encoding ="UTF-8"?> <Alert trigger = "true" > <Alerttype>orderheld</Alerttype> <AlertID>mGMjhgHgffHhhFdH1u4</AlertID> <Order>uwiofhdf</Order> <Quantity>7651</Quanity> </Alert>(CreateEventHandler. C:723) Debug Jan 11 17:30:26.13214 The process has restarted and thread opened Debug Jan 11 17:30:26.13215 The heartbeat is recieved from alertlistener process 

Now the requirement is to take AlertID in the input, scan the process log and extract the matching outgoing XML in a separate file.

Using awk i am able to extract all the outgoing xml files but not sure how to extract the one related to a particular AlertID.

Eg:

awk '/Outgoing/{p=1; s=$0} P & & /<\/Alert>/ {print $0 FS s; s="" ;p=0}p' 1.log>2.log 
5
  • 3
    Use a real XML parser.CommentedJan 12, 2018 at 19:47
  • Any other way to do it?CommentedJan 12, 2018 at 19:49
  • No, really, use a real XML parser. Using awk (which matches regular expressions) is not going to be useful in extracting data from XML (which is context free). If you don't know those terms, it means that you cannot correctly parse XML with regular expressions.
    – Fox
    CommentedJan 13, 2018 at 5:31
  • I've hesitantly voted to re-open. Please would you provide a decent example consisting of several of the log file entries from which you want the XML data to be extracted. The relevant parts are any preamble, the first two or three elements of the XML document, and the corresponding end. Please make it absolutely clear whether each log file entry is on a single line, or multiple lines. If a log entry covers multiple lines please explain (in words) what starts and ends a log entry.CommentedJan 14, 2018 at 20:25
  • Thanks for reopenig the question. I have now given more details how the log looks like. The logs and data is on mutiple lines. The data which must be extracted is from <xml version > and ending at </Alert> tag.CommentedJan 15, 2018 at 2:19

1 Answer 1

0

One method which is not particularly well suited to the task, but should work, is this:

  • Remove LF's so that everything appears on a single line
  • But place a LF after </Alert> so that all XML's are in a row of their own
  • grep for the desired code
  • output the line and clean it up

This translates to:

 tr -d "\r\n" < log_file \ | sed -e 's/\<?xml/\n&/g' -e 's/\<\/Alert>/&\n/g' \ | grep -F '<AlertID>mGMjhgHgffHhhFdH1u4</AlertID>' 

You can even pipe the result to xmllint --format - to pretty-print it.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.