Delete XML node containing certain element

Question

I want to remove all Placemarks from a KML file that contain the element <tessellate>. The following block should be wholly removed:

<Placemark> <styleUrl>#m_ylw-pushpin330</styleUrl> <LineString> <tessellate>1</tessellate> <coordinates> 0.0000000000000,0.0000000000000,0 0.0000000000000,0.0000000000000,0 </coordinates> </LineString> </Placemark>

I have tried some non-greedy perl regex with no luck (a lot of stuff is removed together with the first <Placemark>):

sed -r ':a; N; $!ba; s/\n\t*//g' myplaces.kml | perl -pe 's|<Placemark>.*?<tessellate>.*?</Placemark>||g'

I believe a XML parser is the way to go, but I read the documentation for xmlstarlet and got nowhere. So any solutions in xmlstarlet, python, etc. are also welcome!

Any good reason for not using an xml parser?
– michas
CommentedApr 12, 2013 at 6:32 — michas, CommentedApr 12, 2013 at 6:32
Definitely use an XML parser.
– Gilles 'SO- stop being evil'
CommentedApr 12, 2013 at 21:58 — Gilles 'SO- stop being evil', CommentedApr 12, 2013 at 21:58

Stéphane Chazelas · Accepted Answer · 2013-04-12 07:21:23Z

With xmlstarlet:

xmlstarlet ed -d '//Placemark[.//tessellate]' < myplaces.kml

And as kml uses namespaces, you have to define it first (see the xmlstarlet documentation)

xmlstarlet ed -N 'ns=http://www.opengis.net/kml/2.2' -d '//ns:Placemark[.//ns:tessellate]'

With perl, you'd need to process the file as a whole (not line by line) and add the s flag to s///. And even then, even with non-greedy match, it would still match from the first <Placemark> up the next </Placemark> that occurs after the next <tessellate>. So you'd need to write it something like:

perl -0777 -pe 's|(<Placemark>.*?</Placemark>)| $1 =~ /<tessellate>/?"":$1|gse'

Using xmlstarlet is the best answer, works like a charm on complex XMLs as well as cases where selection needs to be based on attribute value. Also, if you are not able to install xmlstartlet using yum etc., see this link -- pkgs.org/download/xmlstarlet. I was able to download Linux package and run it as a standalone utility, without needing sudo/root access to install new packages. — Ccy, CommentedOct 31, 2019 at 17:57

michas · Accepted Answer · 2013-04-12 06:49:48Z

Given this test file:

start <Placemark> <tessellate>1</tessellate> </Placemark> middle1 <Placemark> </Placemark> middle2 <Placemark> <tessellate>1</tessellate> </Placemark> end

If you do perl -0 -pe 's|<Placemark>.*?<tessellate>.*?</Placemark>||gs' like you suggested it will remove too much:

start middle1 end

This is because the regex is only looking forward. It finds a start tag, takes everything until the first tessellate tag and up to the next end tag. Unfortunatey it does not care if it consumes more start tags in the way...

If you want to do it with regexes you have to process each block on its own: perl -0 -pe 's|<Placemark>.*?</Placemark>|$&=~/<tessellate>/?"":$&|gse'

This should give the desired result.

Just adding desired result output: start middle1 <Placemark> </Placemark> middle2 end — Nasri Najib, CommentedJan 27, 2016 at 6:36

Anthon · Accepted Answer · 2013-04-12 07:18:50Z

Using Python (2.7) with standard modules:

file test.xml:

<Container> <Placemark> <KeepMe/> </Placemark> <Placemark> <styleUrl>#m_ylw-pushpin330</styleUrl> <LineString> <tessellate>1</tessellate> <coordinates> 0.0000000000000,0.0000000000000,0 0.0000000000000,0.0000000000000,0 </coordinates> </LineString> </Placemark> </Container>

And the program:

#! /usr/bin/env python from __future__ import print_function # works on 2.x and 3.x from lxml import etree file_name = 'test.xml' root = etree.parse(file_name) for element in root.iterfind('.//Placemark'): if(element.find('.//tessellate')) is not None: element.getparent().remove(element) print(etree.tostring(root))

gives as output:

<Container> <Placemark> <KeepMe/> </Placemark> </Container>

You mentioned standard modules, but lxml is not standard. Did you mean ElementTree? — iruvar, CommentedDec 12, 2014 at 20:46

Stack Exchange Network

Delete XML node containing certain element

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Delete XML node containing certain element

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions