3

I need to grab attributes from the XML file with pure bash script.

So I have the following XML file with a root element Group and lots of Person elements, every of them has id and username attributes. id is unique value for each element:

<?xml version="1.0" encoding="UTF-8"?> <Group id="D_8" main="false"> <Person id="P_0001" email="[email protected]" username="person_0001" password="pass_0001" active="true"/> <Person id="P_0002" email="[email protected]" username="person_0002" password="pass_0002" active="true"/> <!-- ...and hundreds of other Person elements ... --> </Group> 

And I need to use bash script to extract the id and username attributes into some key-value structure:

P_0001=person_0001 P_0002=person_0002 

Checked other related answers, but most of them suggest to use some XML parsers like xmllint. But unfortunately I do not have them on the target machine.

Please suggest how I can achieve this.

6
  • Perl/PHP are out? awk?CommentedApr 5, 2020 at 12:10
  • 3
    Choose the right tool first. I suggest to use an XML/HTML parser (xmlstarlet, e.g.).
    – Cyrus
    CommentedApr 5, 2020 at 12:22
  • Is not it possible to use the pure bash without any additional libs?
    – Claude
    CommentedApr 5, 2020 at 12:48
  • 3
    Yes, but an XML file with the same content can also be formatted completely differently and this then causes problems. This problem does not exist if you use tools that can interpret XML files correctly.
    – Cyrus
    CommentedApr 5, 2020 at 13:09
  • Does this answer your question? How to parse XML in Bash?CommentedFeb 28, 2024 at 14:41

2 Answers 2

2

As long as the username attribute does not come before id attribute, this is a bash script to give the result:

#/usr/bin/env bash id='\bid="([^"]+)"' username='\busername="([^"]+)"' while IFS= read -r line; do [[ $line =~ $id ]] && idv="${BASH_REMATCH[1]}" [[ $line =~ $username ]] && echo "$idv=${BASH_REMATCH[1]}" done < data.xml exit 0 

It works even when username attribute and id attribute are on the same line.

5
  • Thank you very much Philippe, this is very elegant solution taking into account I cannot use xmlstarlet or any other lib as @Cyrus mentioned.
    – Claude
    CommentedApr 5, 2020 at 17:38
  • Apologies, I might have missed something, but when I run this script against my XML file data.xml I do not get any output.
    – Claude
    CommentedApr 5, 2020 at 18:05
  • How did you you run the script ? what's your bash version : echo $BASH_VERSION
    – Philippe
    CommentedApr 5, 2020 at 18:10
  • I have created script.sh file, pasted the content there, and then run it as usual. data.xml file is in the same location. I am currently running it on my macos and the bash version here is 3.2.57(1)-release. I will on RHEL machine once I have access to it.
    – Claude
    CommentedApr 5, 2020 at 19:35
  • Note that it's legal XML for there to be spaces around the = in an attribute assignment.CommentedJan 3 at 14:20
2

Assumptions:

  • the xml file is 'nicely' formatted as the presented example (hence no need for a XML parser)
  • id is the first attribute in a Person block (ie, we'll always be able to match on the literal string Person id), otherwise additional parsing will be necessary
  • Person id and username only show up in the Group block

One awk solution:

awk -F'"' ' /<Person id=/ { pid=$2 ; next } /[[:space:]]username=/ { printf "%s=%s\n", pid, $2 ; next } ' test.xml 

Running this awk solution against the sample data file generates:

P_0001=person_0001 P_0002=person_0002 

All bets are off if my assumptions are invalid.

1
  • Writing /Person id/ is probably not a good representation as it make you believe that could also do /Person email/ which would fail. Can you separate out the id parts and add another field, like email, for example?
    – not2qubit
    CommentedJan 10 at 11:19

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.