0

I have to delete particular tags from an xml file. Sample xml below.

 <data> <tag:action/> </data> 

I want to delete all contents between data and /data. The XML tags are not displayed in the question after posting.

I am able to do this by using remove() method in Python ElementTree xml parser. I am writing the modified contents to a new after the deletion of the element.

tree.write('new.xml'); 

The problem is that all the tag names in the original xml file are renamed to ns0, ns1 and so on in new.xml.

Is there any way to modify the XML file keeping all other contents in tact?

1
  • That looks like an incomplete XML file to me. How would lxml know what namespace to associate with tag?
    – Anthon
    CommentedMay 8, 2014 at 5:25

1 Answer 1

2

You can use beautiful soup to do the job :

#!/usr/bin/python # -*- coding: utf-8 -*- import bs4 content = ''' <people> <person born="1975"> <name> <first_name>John</first_name> <last_name>Doe</last_name> </name> <profession>computer scientist</profession> <homepage href="http://www.example.com/johndoe"/> </person> <person born="1977"> <name> <first_name>Jane</first_name> <last_name>Doe</last_name> </name> <profession>computer scientist</profession> <homepage href="http://www.example.com/janedoe"/> </person> </people> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(content) for s in soup('name'): s.extract() print(soup) 

It produces the following result :

<html><body><people> <person born="1975"> <profession>computer scientist</profession> <homepage href="http://www.example.com/johndoe"></homepage> </person> <person born="1977"> <profession>computer scientist</profession> <homepage href="http://www.example.com/janedoe"></homepage> </person> </people> </body></html> 

With namespaces :

#!/usr/bin/python # -*- coding: utf-8 -*- import bs4 content = '''<people xmlns:h="http://www.example.com/to/"> <h:person born="1975"> <h:name> <h:first_name>John</h:first_name> <h:last_name>Doe</h:last_name> </h:name> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/johndoe"/> </h:person> <h:person born="1977"> <h:name> <h:first_name>Jane</h:first_name> <h:last_name>Doe</h:last_name> </h:name> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/janedoe"/> </h:person> </people> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(content).people for s in soup('h:name'): s.extract() print(soup) 

I added .people to prevent <html><body></body></html> in the result.

<people xmlns:h="http://www.example.com/to/"> <h:person born="1975"> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/johndoe"></h:homepage> </h:person> <h:person born="1977"> <h:profession>computer scientist</h:profession> <h:homepage href="http://www.example.com/janedoe"></h:homepage> </h:person> </people> 
2
  • Thank You for the answer. I got it working with beautifulsoup.
    – Akhitha
    CommentedMay 8, 2014 at 11:30
  • Thank You for the answer. I got it working with beautifulsoup. But, there are namespaces in XML tags. How can I search for a particular tag if namespace is present. I used find and find_all, but its not returning the values.
    – Akhitha
    CommentedMay 8, 2014 at 13:23

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.