-3

I have two lines as shown below in my input file input.txt and I need to extract claimStartDate from first line and claimEndDate from second line.

<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00"> <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00"> rm input.txt awk '/<ProfessionalClaim/' test.xml | head -1 > input.txt awk '/<ProfessionalClaim/' test.xml | tail -1 >> input.txt awk '{match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \ {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt 
7
  • Question needs to be completed.
    – cagdas
    CommentedJan 24, 2019 at 7:22
  • F_LINE=<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00"> L_LINE=<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00">CommentedJan 24, 2019 at 7:23
  • These lines are in a text file you want to use as the input? Are there multiple F_LINE and L_LINE? How should your output look like? Please edit your question and add these information. Use the code button to present file contents and commands better. Thanks!CommentedJan 24, 2019 at 7:35
  • I have pulled these two lines from XML file and use this as input to pull the claimStartDate from F_LINE & claimEndDate from L_LINE. I have changed the question now. Please let me know if need anymore details. thanks!CommentedJan 24, 2019 at 7:38
  • 2
    It would be appropriate and more efficient to use an XML parser (like XMLStarlet or a Perl/Python XML parser module) on the original XML document. You have not shown how these lines are part of the original document or how you parse them out.
    – Kusalananda
    CommentedJan 24, 2019 at 7:41

2 Answers 2

0
$ awk '/F_LINE/ {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \ /L_LINE/ {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt 2018-04-02 2018-04-17 

EDIT due to your new information:

$ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \ NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' input.txt 2018-04-02 2018-04-17 

You can also do this all in one run:

$ grep "<ProfessionalClaim" text.xml \ | sed -n '1p;$p' \ | $ awk 'NR==1 {match($0, "claimStartDate=\"([^\"]+)\"", start); print start[1]} \ NR==2 {match($0, "claimEndDate=\"([^\"]+)\"", end); print end[1]}' 
  • grep find all line with <ProfessionalClaim in text.xml
  • sed truncate the lines to the first and the last onyl
  • awk will print the claimStartDate for the first line and ClaimEndDate for the second line
6
  • my inputs are in two string variable F_LINE & L_LINE. what is this input.txt here?CommentedJan 24, 2019 at 8:34
  • As you hasn't specify how you pulled the two lines I assumed they are in a new file called input.txt in my example. If this is not the case, provide more information in your original post, how you've extracted them and from where you start now. (show some code, what language are you using, ...)CommentedJan 24, 2019 at 8:45
  • Earlier I was writing those two lines in to separate variable each called F_LINE and L_LINE {<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00">} {<ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00">}CommentedJan 24, 2019 at 16:25
  • I need only the claimStartDate from first line and claimEndDate from second line.CommentedJan 24, 2019 at 16:34
  • Thanks a lot it s working fine! Also need to take one other field from first and last line.(ClaimProcessedDateTime). I am using the below one for that, but for some reason the paid_stop not getting populated. grep "<ProfessionalClaim" test.xml \ | sed -n '1p;$p' \ |awk 'NR==1 {match($0, "claimProcessedDateTime=\"([^\"]+)\"", start); print "paid_start " start[1]} \ NR==2 {match($0, "ClaimProcessedDateTime=\"([^\"]+)\"", end); print "paid_stop " end[1]}'CommentedJan 24, 2019 at 19:05
0

Assuming some XML input document like the following:

<?xml version="1.0"?> <root> <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180409120000102" claimEndDate="2018-04-02" claimStartDate="2018-04-02" sourceSystemId="abcd" claimActionCode="00"/> <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-17" claimStartDate="2018-04-17" sourceSystemId="abcd" claimActionCode="00"/> <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-18" claimStartDate="2018-04-18" sourceSystemId="abcd" claimActionCode="00"/> <ProfessionalClaim paymentIndicator="P" claimProcessedDateTime="20180430120000281" claimEndDate="2018-04-19" claimStartDate="2018-04-19" sourceSystemId="abcd" claimActionCode="00"/> </root> 

... we may use xmlstarlet to extract the claimStartDate attribute's value from each ProfessionalClaim node that has another ProfessionalClaim node following it, together with that next ProfessionalClaim node's claimEndDate attribute's value:

xmlstarlet select --template \ --match '//ProfessionalClaim[following-sibling::ProfessionalClaim/@claimEndDate]' \ --value-of 'concat(@claimStartDate, " ", following-sibling::ProfessionalClaim/@claimEndDate)' \ -nl input.txt 

This first matches each ProfessionalClaim node that is followed by another ProfessionalClaim node.

For each such node, the value of the claimStartDate attribute is concatenated with the value of the claimEndDate attribute of the following ProfessionalClaim node, with a single space character as delimiter.

Given my example document above, this would generate

2018-04-02 2018-04-17 2018-04-17 2018-04-18 2018-04-18 2018-04-19 

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.