grep or awk to extact xml from log based on search string

Question

I have a log file which has XMLs being logged. I need to search and extract all XML's that have a specific string in the any one of the nodes.

e.g. the log file will have mulitple xml's containing the search param.

randomlogentry1 randomlogentry2 Printing XML:<CreateDataABC> <Tag1>searchparam</Tag1> </CreateDataABC> randomlogentry3 randomlogentry4 randomlogentry5 Printing XML: <DataCreatedABC> <TagA>otherparam</TagA> <TagB>searchparam</TagB> <TagC>otherparam</TagC> </DataCreatedABC> randomlogentry6 randomlogentry7

The expected output is the two XML's printed on console or written to seperate files.

XML1:

<CreateDataABC> <Tag1>searchparam</Tag1> </CreateDataABC>

XML2:

<DataCreatedABC> <TagA>otherparam</TagA> <TagB>searchparam</TagB> <TagC>otherparam</TagC> </DataCreatedABC>

The position of 'searchparam' in a XML is never fixed and the only constants are the 'ABC' string and the 'searchparam'.

I thought to use sed to extract between 2 line numbers for which I tried the following:

Search for the searchparam and identify line no.
Find the next occurence of ABC and get the line number,

I somehow cant seem to be able to find the previous occurence of ABC from a specific line!!

Has anyone done this before?

EDIT: Updated the example log format and expected output.

extend your content to show a surrounded parts of the search xml fragment — RomanPerekhrest, CommentedMay 25, 2018 at 8:31

Siva · Accepted Answer · 2018-05-25 10:57:19Z

0

Try this:

Max=`grep -c "^Printing" file.xml` for count in `seq 1 $Max` do sed -nr '/Printing/H;//,/ABC/G;s/\n(\n[^\n]*){'$count'}$//p' file.xml | sed 's/Printing XML://' > $count.xml done

edited May 25, 2018 at 10:57

answered May 25, 2018 at 8:35

Siva

9,2128 gold badges59 silver badges87 bronze badges

Thanks, I have updated the query with more details, what I need to do is to extract the entire XML out of a text log file...
– Saravanakumar Mohan
CommentedMay 25, 2018 at 9:35
plz, share the expected output as like I did in my answer.
– Siva
CommentedMay 25, 2018 at 10:03
have updated the exact output expected
– Saravanakumar Mohan
CommentedMay 25, 2018 at 10:09
try my updated answer
– Siva
CommentedMay 25, 2018 at 10:58
Thanks Siva, as I mentioned, the only constants are 'ABC' and 'searchparam' so I cannot depend on the presence of 'Printing'.
– Saravanakumar Mohan
CommentedMay 25, 2018 at 11:28

| Show 1 more comment

Saravanakumar Mohan · Accepted Answer · 2018-05-29 11:44:13Z

Here is what I wrote, but I am sure there is a shorter and more elegant way of doing this.

searchstring=searchparam filename=test.log pattern1=ABC linenums=($(grep -n "${searchstring}" ${filename} | awk -F":" '{print $1}')) len=${#linenums[@]} for (( i=0; i<${len}; i++ )); do currentline=${linenums[$i]} relativeendlinearray=($(tail -n +${currentline} ${filename} | grep -n "${pattern1}" | awk -F":" '{print $1}')) actualendline=$(($currentline+${relativeendlinearray[0]}-1)) index=$currentline while [ $index -ne 0 ] do found=`sed "${index}q;d" ${filename} | grep "${pattern1}"` if [ -n "$found" ]; then actualstartline=$index break; fi index=$[$index-1] done if [ -n "$found" ]; then echo "" else echo "Log break detected, content across multiple files" fi echo "Start Line" ${actualstartline} echo "Current Line" ${currentline} echo "End Line" ${actualendline} sed -n "${actualstartline},${actualendline}p" ${filename} done

Stack Exchange Network

grep or awk to extact xml from log based on search string

2 Answers 2

You must log in to answer this question.

Hot Network Questions

grep or awk to extact xml from log based on search string

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions