Skip to content Skip to sidebar Skip to footer

Lmxl Incremental Xml Serialisation Repeats Namespaces

I am currently serializing some largish XML files in Python with lxml. I want to use the incremental writer for that. My XML format heavily relies on namespaces and attributes. Whe

Solution 1:

It is possible to produce something close to what you are looking for:

from io import BytesIO

from lxml import etree

sink = BytesIO()

nsmap = {
    'test': 'http://test.org',
    'foo': 'http://foo.org',
    'bar': 'http://bar.org',
}

with etree.xmlfile(sink) as xf:
    with xf.element("test:testElement", nsmap=nsmap):
        with xf.element("foo:fooElement"):
            passprint(sink.getvalue().decode('utf-8'))

This produces the XML:

<test:testElement xmlns:bar="http://bar.org" xmlns:foo="http://foo.org" xmlns:test="http://test.org"><foo:fooElement></foo:fooElement></test:testElement>

The extra namespace declaration is gone, but instead of an immediately closing element, you get a pair of opening and closing tags for foo:fooElement.

I looked at the source code of lxml.etree.xmlfile and do not see the code there maintaining state that it would then examine to know what namespaces are already declared and avoid declaring them again needlessly. It is possible I just missed something, but I really don't think I did. The point of an incremental XML serializer is to operate without using gobs of memory. When memory is not an issue, you can just create a tree of objects representing the XML document and serialize that. You pay a significant memory cost because the whole tree has to be available in memory until the tree is serialized. By using an incremental serializer, you can dodge the memory issue. In order to maximize the memory savings, the serializer must minimize the amount of state it maintains. If when it produces an element in the serialization, it were to take into account the parents of this element, then it would have to "remember" what the parents were and maintain state. In the worst case scenario it would maintain so much state that it would provide no benefit over just creating a tree of XML objects that are then serialized.

Solution 2:

You need to create a SubElement:

_nsmap={
    'test': 'http://test.org',
    'foo': 'http://foo.org',
    'bar': 'http://bar.org',
}

root = etree.Element(
    "{http://bar.org}test",
    creator='SO',
    nsmap=_nsmap
)

doc = etree.ElementTree(root)
name = etree.QName(_nsmap["foo"], "fooElement")
elem = etree.SubElement(root, name)

doc.write('/tmp/foo.xml', xml_declaration=True, encoding='utf-8', pretty_print=True)
print (open('/tmp/foo.xml').read())

Returns:

<?xml version='1.0' encoding='UTF-8'?><bar:testxmlns:bar="http://bar.org"xmlns:foo="http://foo.org"xmlns:test="http://test.org"creator="SO"><foo:fooElement/></bar:test>

Post a Comment for "Lmxl Incremental Xml Serialisation Repeats Namespaces"