0

I have a log file which has XMLs being logged. I need to search and extract all XML's that have a specific string in the any one of the nodes.

e.g. the log file will have mulitple xml's containing the search param.

randomlogentry1 randomlogentry2 Printing XML:<CreateDataABC> <Tag1>searchparam</Tag1> </CreateDataABC> randomlogentry3 randomlogentry4 randomlogentry5 Printing XML: <DataCreatedABC> <TagA>otherparam</TagA> <TagB>searchparam</TagB> <TagC>otherparam</TagC> </DataCreatedABC> randomlogentry6 randomlogentry7 

The expected output is the two XML's printed on console or written to seperate files.

XML1:

<CreateDataABC> <Tag1>searchparam</Tag1> </CreateDataABC> 

XML2:

<DataCreatedABC> <TagA>otherparam</TagA> <TagB>searchparam</TagB> <TagC>otherparam</TagC> </DataCreatedABC> 

The position of 'searchparam' in a XML is never fixed and the only constants are the 'ABC' string and the 'searchparam'.

I thought to use sed to extract between 2 line numbers for which I tried the following:

  1. Search for the searchparam and identify line no.
  2. Find the next occurence of ABC and get the line number,

I somehow cant seem to be able to find the previous occurence of ABC from a specific line!!

Has anyone done this before?

EDIT: Updated the example log format and expected output.

3
  • 2
    extend your content to show a surrounded parts of the search xml fragmentCommentedMay 25, 2018 at 8:31
  • Is the log file a well-formed XML file?
    – Kusalananda
    CommentedMay 25, 2018 at 9:29
  • Log file is not an XML its textCommentedMay 25, 2018 at 9:35

2 Answers 2

0

Try this:

Max=`grep -c "^Printing" file.xml` for count in `seq 1 $Max` do sed -nr '/Printing/H;//,/ABC/G;s/\n(\n[^\n]*){'$count'}$//p' file.xml | sed 's/Printing XML://' > $count.xml done 
6
  • Thanks, I have updated the query with more details, what I need to do is to extract the entire XML out of a text log file...CommentedMay 25, 2018 at 9:35
  • plz, share the expected output as like I did in my answer.
    – Siva
    CommentedMay 25, 2018 at 10:03
  • have updated the exact output expectedCommentedMay 25, 2018 at 10:09
  • try my updated answer
    – Siva
    CommentedMay 25, 2018 at 10:58
  • Thanks Siva, as I mentioned, the only constants are 'ABC' and 'searchparam' so I cannot depend on the presence of 'Printing'.CommentedMay 25, 2018 at 11:28
0

Here is what I wrote, but I am sure there is a shorter and more elegant way of doing this.

searchstring=searchparam filename=test.log pattern1=ABC linenums=($(grep -n "${searchstring}" ${filename} | awk -F":" '{print $1}')) len=${#linenums[@]} for (( i=0; i<${len}; i++ )); do currentline=${linenums[$i]} relativeendlinearray=($(tail -n +${currentline} ${filename} | grep -n "${pattern1}" | awk -F":" '{print $1}')) actualendline=$(($currentline+${relativeendlinearray[0]}-1)) index=$currentline while [ $index -ne 0 ] do found=`sed "${index}q;d" ${filename} | grep "${pattern1}"` if [ -n "$found" ]; then actualstartline=$index break; fi index=$[$index-1] done if [ -n "$found" ]; then echo "" else echo "Log break detected, content across multiple files" fi echo "Start Line" ${actualstartline} echo "Current Line" ${currentline} echo "End Line" ${actualendline} sed -n "${actualstartline},${actualendline}p" ${filename} done 

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.