2

I have already extracted the tag from the source document using grep but, now I cant seem to figure out how to easily extract the properties from the string. Also I want to avoid having to use any programs that would not usually be present on a standard installation.

$tag='<img src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg" title="Don't we all." alt="Barrel - Part 1" />' 

I need to end up with the following variables

$src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg" $title="Don't we all." $alt="Barrel - Part 1" 

    4 Answers 4

    4

    You can use xmlstarlet. Then, you don't even have to extract the element yourself:

    $ echo $tag|xmlstarlet sel -t --value-of '//img/@src' http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg 

    You can even turn this into a function

    $ get_attribute() { echo $1 | xmlstarlet sel -t -o "&quot;" -v $2 -o "&quot;" } $ src=get_attribute $tag '//img/@src' 

    If you don't want to reparse the document several times, you can also do:

    $ get_values() { eval file=\${$#} eval $#= cmd="xmlstarlet sel " for arg in $@ do if [ -n $arg ] then var=${arg%%\=*} expr=${arg#*=} cmd+=" -t -o \"$var=&quot;\" -v $expr -o \"&quot;\" -n" fi done eval $cmd $file } $ eval $(get_values src='//img/@src' title='//img/@title' your_file.xml) $ echo $src http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg $ echo $title Don't we all. 

    I'm sure there's a better way to remove the last argument to a shell function, but I don't know it.

    1
    • Oh, then xmlstarlet might not be available on a standard installation. Sorry, I think it was a little too late when I wrote the answer...CommentedOct 10, 2008 at 10:58
    1

    I went with dacracot's suggestion of using sed although I would have prefered if he had given me some sample code

    src=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\1/'` title=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\2/'` alt=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\3/'` 
    4
    • 1
      Using sed is a really, really bad approach -- it's brittle and doesn't know anything at all about the XML standard, and so will give you bad results when encountering things like &amp. See Torsten Marek's suggestion.CommentedOct 10, 2008 at 2:34
    • sorry i didnt work out the sed script for you, i didnt have time right then
      – dacracot
      CommentedOct 10, 2008 at 4:19
    • 1
      If you don't have time to write a good answer, then don't write one. Even if you do, be sure to come back and edit it later.
      – ephemient
      CommentedOct 10, 2008 at 15:21
    • What is your definition of good? I find it very amusing that I have my answer is selected with -1 votes. No I didn't code it for him, but I sent him in the right direction to find the answer. Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.
      – dacracot
      CommentedOct 10, 2008 at 18:38
    0

    If xmlstarlet is available on a standard installation and the sequence of src-title-alt does not change, you can use the following code as well:

    tag='<img src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg" title="Don'"'"'t we all." alt="Barrel - Part 1" />' xmlstarlet sel -T -t -m "/img" -m "@*" -v '.' -n <<< "$tag" IFS=$'\n' array=( $(xmlstarlet sel -T -t -m "/img" -m "@*" -v '.' -n <<< "$tag") ) src="${array[0]}" title="${array[1]}" alt="${array[2]}" printf "%s\n" "src: $src" "title: $title" "alt: $alt" 
      0

      Since this bubbled up again, there is now my Xidel that has 2 features which make this task trivial:

      • pattern matching on the xml

      • exporting all matched variables to the shell

      So it becomes a single line:

      eval $(xidel "$tag" -e '<img src="{$src}" title="{$title}" alt="{$alt}"/>' --output-format bash) 

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.