python xml parsing

Question

I have to delete particular tags from an xml file. Sample xml below.

 <data> <tag:action/> </data>

I want to delete all contents between data and /data. The XML tags are not displayed in the question after posting.

I am able to do this by using remove() method in Python ElementTree xml parser. I am writing the modified contents to a new after the deletion of the element.

tree.write('new.xml');

The problem is that all the tag names in the original xml file are renamed to ns0, ns1 and so on in new.xml.

Is there any way to modify the XML file keeping all other contents in tact?

That looks like an incomplete XML file to me. How would lxml know what namespace to associate with tag? — Anthon, CommentedMay 8, 2014 at 5:25

jfgiraud · Accepted Answer · 2014-05-09 06:50:04Z

You can use beautiful soup to do the job :

#!/usr/bin/python # -*- coding: utf-8 -*- import bs4 content = ''' <people> <person born="1975"> <name> <first_name>John</first_name> <last_name>Doe</last_name> </name> <profession>computer scientist</profession> <homepage href="http://www.example.com/johndoe"/> </person> <person born="1977"> <name> <first_name>Jane</first_name> <last_name>Doe</last_name> </name> <profession>computer scientist</profession> <homepage href="http://www.example.com/janedoe"/> </person> </people> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(content) for s in soup('name'): s.extract() print(soup)

It produces the following result :

<html><body><people> <person born="1975"> <profession>computer scientist</profession> <homepage href="http://www.example.com/johndoe"></homepage> </person> <person born="1977"> <profession>computer scientist</profession> <homepage href="http://www.example.com/janedoe"></homepage> </person> </people> </body></html>

With namespaces :

#!/usr/bin/python # -*- coding: utf-8 -*- import bs4 content = '''<people xmlns:h="http://www.example.com/to/"> <h:person born="1975"> <h:name> <h:first_name>John</h:first_name> <h:last_name>Doe</h:last_name> </h:name> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/johndoe"/> </h:person> <h:person born="1977"> <h:name> <h:first_name>Jane</h:first_name> <h:last_name>Doe</h:last_name> </h:name> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/janedoe"/> </h:person> </people> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(content).people for s in soup('h:name'): s.extract() print(soup)

I added .people to prevent <html><body></body></html> in the result.

<people xmlns:h="http://www.example.com/to/"> <h:person born="1975"> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/johndoe"></h:homepage> </h:person> <h:person born="1977"> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/janedoe"></h:homepage> </h:person> </people>

Thank You for the answer. I got it working with beautifulsoup. — Akhitha, CommentedMay 8, 2014 at 11:30
Thank You for the answer. I got it working with beautifulsoup. But, there are namespaces in XML tags. How can I search for a particular tag if namespace is present. I used find and find_all, but its not returning the values. — Akhitha, CommentedMay 8, 2014 at 13:23

Stack Exchange Network

python xml parsing

1 Answer 1

You must log in to answer this question.

Hot Network Questions

python xml parsing

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions