There is a massive problem with trying to parse XML with 'standard' unix tools. XML is a data structure, and it supports a variety of layouts that are semantically identically, but don't have the same line and indentation.
This means it's really a bad idea to parse as line/regex based, because you'll be creating some fundamentally brittle code. Someone may restructure their XML at some point, and your code will break for no obvious reason. That's the kind of thing that gives maintenance programmers and future sysadmins some real pain.
So yes, please use an XML parser. There are a variety of options - someone's given you a python option, so I'm including perl in here too.
#!/usr/bin/perl use strict; use warnings; use XML::Twig; sub process_part { my ( $twig, $part ) = @_; if ( $part->att('name') =~ m/wheel/ ) { $part->first_child('yrot')->set_att( 'min', 'INF' ); $part->first_child('yrot')->set_att( 'max', 'INF' ); } } my $twig = XML::Twig->new( 'pretty_print' => 'indented_a', 'twig_handlers' => { 'part' => \&process_part } ); $twig->parsefile('your_file.xml'); $twig->print;
Now, as for the reason 'checking' your text is diffcult - these are all the same:
<root> <part name="D_wheel1" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> <part name="D_wheel2" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> <part name="door" seqNumber="1"> <yrot cur="0.000000" max="0.000000" min="0.000000" /> </part> </root>
And:
<root><part name="D_wheel1" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part><part name="D_wheel2" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part><part name="door" seqNumber="1"><yrot cur="0.000000" max="0.000000" min="0.000000"/></part></root>
And:
<root ><part name="D_wheel1" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part><part name="D_wheel2" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part><part name="door" seqNumber="1" ><yrot cur="0.000000" max="0.000000" min="0.000000" /></part></root>
They are all semantically identical, but hopefully as you can see - won't parse the same. Things like unary tags - like>
<yrot cur="0.000000" max="0.000000" min="0.000000" />
Vs:
<yrot cur="0.000000" max="0.000000" min="0.000000" ></yrot>
Also - semantically identical. So you can get away with line-and-regex but it's taking a gamble and building brittle code.
sed
andawk
, but another XML file that is syntactically the same, but has a different layout is not. Unless you have complete control over current and future input, you better of with a real XML parser<part><yrot></yrot></part>
? Will the<yrot></yrot>
tags always be on the same line? If so, please edit your question and clarify and also show us your desired output, what changes would you like to make?