Skip to content Skip to sidebar Skip to footer

Using Minidom To Parse Xml

Hi I have trouble understanding the minidom module for Python. I have xml that looks like this: Dexter 7

Solution 1:

Each episode element has child-elements, including a title element. Your code, however, is looking for attributes instead.

To get text out of a minidom element, you need a helper function:

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
    return ''.join(rc)

And then you can more easily print all the titles:

for episode in xml.getElementsByTagName('episode'):
    for title in episode.getElementsByTagName('title'):
        print getText(title)

Solution 2:

title is not an attribute, its a tag. An attribute is like src in <img src="foo.jpg" />

>>> parsed = parseString(s)
>>> titles = [ for n in parsed.getElementsByTagName('title')]
>>> titles
[u'Dexter', u'Crocodile', u'Popping Cherry']

You can extend the above to fetch other details. lxml is better suited for this though. As you can see from the snippet above minidom is not that friendly.

Solution 3:

Thanks to Martijn Pieters who tipped me with the ElementTree API I solved this problem.

xml = ET.parse(urlopen(""))
                print 'xml fetched..'
                for episode in xml.iter('episode'):
                    print episode.find('title').text


Post a Comment for "Using Minidom To Parse Xml"