2

Using Git Bash, I'm trying to conditionally replace what's in the yrot tag in hundreds of files, but only if it belongs with a part name tag that pertains to wheel.

// YES, change <part name="D_wheel1" seqNumber="1" > <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> // YES, change <part name="D_wheel2" seqNumber="1" > <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> // NO, don't change <part name="door" seqNumber="1" > <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> // Example Line Change // From: <yrot min="0.000000" max="0.000000" cur="0.000000" /> // To: <yrot min="INF" max="INF"/> 

Is this even possible using the likes of awk? Or do I need to use some sort of special XML parser?

EDIT: To be clear, there are about a dozen tags that belong to , one of them being a . only appears within a tag. I only want to replace the line if the name contains "wheel". itself is nested.

To those claiming I need an XML parser, why wouldn't just a simple text find/replace work if the condition is met (yrot tag is in wheels)? Is checking that so difficult?

2
  • You don't need a special XML parser, just an XML parser. The problem is that some XML files can be changed with line oriented tools like sed and awk, but another XML file that is syntactically the same, but has a different layout is not. Unless you have complete control over current and future input, you better of with a real XML parser
    – Anthon
    CommentedJun 15, 2015 at 18:57
  • It is possible for the specific case you show. Are all your XML files as simple as that? Are the only nested tags <part><yrot></yrot></part>? Will the <yrot></yrot> tags always be on the same line? If so, please edit your question and clarify and also show us your desired output, what changes would you like to make?
    – terdon
    CommentedJun 15, 2015 at 19:16

4 Answers 4

3

Provided your XML in data.xml as:

 $ cat data.xml <?xml version="1.0" encoding="UTF-8"?> <root> <part name="D_wheel1" seqNumber="1"> <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> <part name="D_wheel2" seqNumber="1"> <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> <part name="door" seqNumber="1"> <yrot min="0.000000" max="0.000000" cur="0.000000" /> </part> </root> 

Using xmlstarlet with XPath:

$ xmlstarlet ed \ --var target '//part[contains(@name, "wheel")]/yrot' \ -u '$target/@*[name()="min" or name()="max"]' -v 'INF' \ -d '$target/@cur' data.xml <?xml version="1.0" encoding="UTF-8"?> <root> <part name="D_wheel1" seqNumber="1"> <yrot min="INF" max="INF"/> </part> <part name="D_wheel2" seqNumber="1"> <yrot min="INF" max="INF"/> </part> <part name="door" seqNumber="1"> <yrot min="0.000000" max="0.000000" cur="0.000000"/> </part> </root> 

Or the classical approach using XSLT: and xsltproc or xmlstarlet

$ cat data.xsl <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="*[contains(@name, 'wheel')]/yrot"> <xsl:copy> <xsl:attribute name="min">INF</xsl:attribute> <xsl:attribute name="max">INF</xsl:attribute> </xsl:copy> </xsl:template> </xsl:stylesheet> $ xsltproc data.xsl data.xml #or: xmlstarlet tr data.xsl data.xml <?xml version="1.0" encoding="UTF-8"?> <root> <part name="D_wheel1" seqNumber="1"> <yrot min="INF" max="INF"/> </part> <part name="D_wheel2" seqNumber="1"> <yrot min="INF" max="INF"/> </part> <part name="door" seqNumber="1"> <yrot min="0.000000" max="0.000000" cur="0.000000"/> </part> </root> 
1
  • 1
    very nice solutionCommentedJun 16, 2015 at 1:01
2

Using python's ElementTree standard library:

#! /usr/bin/env python import sys import xml.etree.ElementTree as ET def do_one(file_name): tree = ET.parse(file_name) for part in tree.findall("part"): if not 'wheel' in part.attrib['name']: continue for yrot in part.findall('yrot'): names = [] for x in yrot.attrib: names.append(x) for x in names: del yrot.attrib[x] yrot.attrib['min'] = 'INF' yrot.attrib['max'] = 'INF' tree.write(file_name) for file_name in sys.argv[1:]: do_one(file_name) 

This parses all file handed on the commandline to the script:

python convert_xml.py *.xml 
1
  • I like this approach the most, as it's the safest, and even comes recommended by another answer in this thread. I'm accepting the answer, but I'm having trouble getting my python script to run with permissions from Git Bash in Windows 7.
    – mdeforge
    CommentedJun 16, 2015 at 13:11
2

There is a massive problem with trying to parse XML with 'standard' unix tools. XML is a data structure, and it supports a variety of layouts that are semantically identically, but don't have the same line and indentation.

This means it's really a bad idea to parse as line/regex based, because you'll be creating some fundamentally brittle code. Someone may restructure their XML at some point, and your code will break for no obvious reason. That's the kind of thing that gives maintenance programmers and future sysadmins some real pain.

So yes, please use an XML parser. There are a variety of options - someone's given you a python option, so I'm including perl in here too.

#!/usr/bin/perl use strict; use warnings; use XML::Twig; sub process_part { my ( $twig, $part ) = @_; if ( $part->att('name') =~ m/wheel/ ) { $part->first_child('yrot')->set_att( 'min', 'INF' ); $part->first_child('yrot')->set_att( 'max', 'INF' ); } } my $twig = XML::Twig->new( 'pretty_print' => 'indented_a', 'twig_handlers' => { 'part' => \&process_part } ); $twig->parsefile('your_file.xml'); $twig->print; 

Now, as for the reason 'checking' your text is diffcult - these are all the same:

<root> <part name="D_wheel1" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> <part name="D_wheel2" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> <part name="door" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> </root> 

And:

<root><part name="D_wheel1" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part><part name="D_wheel2" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part><part name="door" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part></root> 

And:

<root ><part name="D_wheel1" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part><part name="D_wheel2" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part><part name="door" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part></root> 

They are all semantically identical, but hopefully as you can see - won't parse the same. Things like unary tags - like>

 <yrot cur="0.000000" max="0.000000" min="0.000000" /> 

Vs:

 <yrot cur="0.000000" max="0.000000" min="0.000000" ></yrot> 

Also - semantically identical. So you can get away with line-and-regex but it's taking a gamble and building brittle code.

    0

    Using awk. Note that this assumes a very simple file structure like the one you show. I cannot guarantee that it will work on arbitrary XLM files. In fact, I can flat out guarantee it won't.

    awk '{if(/<\/part>/){p=0}if($1~/<part/ && $2~/wheel/){p=1} if(p==1 && /<yrot/){ print "<yrot min=\"INF\" max=\"INF\"/>" } else{print}}' file 

    Seriously though, this is as fragile as can be. It assumes that the name= is always the 2nd space delimited field on the line, it breaks on nested tags and all sorts of other possible complications. It gives your desired output on the example you gave but it will break on the tiniest change you make to the files. Anthon's approach using a proper parser is much safer.

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.