An easy way to write XML in Python
David Mertz's gnosis utilities include the module gnosis.xml.objectify, which makes parsing XML in python as simple as could be.
xmltext = """<?xml version="1.0" encoding="UTF-8"?>
<root><first canned="true" yummy="false">spam</first><second>egg</second></root>"""
inst = objectify.make_instance(xmltext)
print "first:", inst.first.PCDATA
print " canned:", inst.first.canned
print " yummy:", inst.first.yummy
print "second:", inst.second.PCDATA
Output:
canned: true
yummy: false
second: egg
I wanted a similarly simple way to write XML, so I wrote a little module named xmlwriter (zip file) to do it.
Usage:
from StringIO import StringIO
root = XmlNode(u"root")
first = root.first
first.val = u"spam"
first["yummy"] = u"false"
first["canned"] = u"true"
root.second.val = u"egg"
out = StringIO()
xmlify(root, out)
print out.getvalue()
Output:
<root><first canned="true" yummy="false">spam</first><second>egg</second></root>
It's still pretty basic. For example, you can't have sibling nodes with the same tag name. But it's a very simple way to do some down and dirty XML writing.
Here's the xmlwriter module code.
class XmlNode(object):
"""Pythonic representation of an XML node.
Expects values and attributes to be in Unicode
Example:
root = XmlNode(u"root")
root.node.val = u"value"
root["attr"] = u"name"
"""
def __init__(self, tag=None, value=None):
self._tag = tag
self.val = value
self._attrs = {}
def __getattr__(self, attr):
"""Add nodes on the fly"""
self.__dict__[attr] = XmlNode(attr)
return self.__dict__[attr]
# dictionary access
def __getitem__(self, key):
return self._attrs[key]
def __setitem__(self, key, val):
self._attrs[key] = val
def write_open_tag(node, out):
"""
Writes the opening tag for node, including attributes
out is a file-like object
"""
out.write("<%s" % node._tag)
out.write("".join([' %s="%s"' % (k, v.encode("utf-8"))
for k, v in node._attrs.items()]))
out.write(">")
def xmlify(root, out):
"""
Takes the root, and recursively goes
down printing out the tags, attributes, and values.
out is a file-like object
"""
# write XML header if this is the first node
if not out.pos:
print >> out, """<?xml version="1.0" encoding="UTF-8"?>"""
# opening tag
write_open_tag(root, out)
# value
if root.val:
out.write(root.val.encode("utf-8"))
# sub-nodes
for item in dir(root):
attr = getattr(root, item)
if isinstance(attr, XmlNode):
xmlify(attr, out)
# closing tag
out.write("</%s>" % root._tag)

This has the potential to be useful, however you need to put some escaping in for adding the text nodes.
From your example:
first.val = u”This contains characters”
Should escape to >
I wrote a small wrapper around libxml2 to write XML in a serial manner, source code is available at http://www.newmediascientist.com/read.html?item=quick-and-easy-xml-creation
I think you’ve kind of proven your point, because the comment monster seems to have eaten your angle brackets.
There are lots of options for escaping XML entities in Python. Here’s a page on it from the Python wiki. I’m not entirely convinced that the module should be escaping xml, but I agree that it would be simpler for the user.
I’m sure the solution you wrote is more correct, but pardon me for saying — it doesn’t look incredibly simple to use.
I just want to do:
prefs = XmlNode("prefs") prefs.language.val = u"English" prefs.user.name.val = u"Ryan" prefs.user.email.val = u"ryan@example.com" prefs.user["id"] = u"123" prefs.onions.val = u"extra"So it has, I didn’t notice the comment getting mangled.
I think one of the main problems is that the XML standard is easy to understand the basics but the full specification is pretty complicated. I used libxml2 as the main engine since it’s pretty much watertight being the Gnome XML library.
I think you may run into limitations quite quickly using instances as nodes. For example, how would you generate an XHTML list?
div = XmlNode(“container”)
div.ul.li = “First Item”
div.ul.li = “About to overwrite the first item?”
I tinkered with serializing Python objects to XML a while back and came to the conclusion that the limitations outweigh the usefulness.
You’re definitely right that things like lists are a big weakness in the module. I’ll probably try to add ways to handle that at some point.
But then again, for a complex document I’m more likely to use a templating language than to generate the XML tree in code. I guess that’s just me, though.
Strange, my comment didn’t go through I guess.
Anyway, I pointed out that there’s lxml and ElementTree (etree) that might be useful for you without having to roll your own solutions like this.
Since 2.5 Python has xml.etree, but there’s still an external version being updated and maintained.
How about the following for lists:
div.ul.li = ["first item", "second item"]
Of course that only supports a homogeneous array of child tags.
@Kamil
The problem with your proposal is that you can’t override the assignment operator in Python. Thus you’d need some serious voodoo to stick children under those li nodes.
I’m thinking of syntax along these lines (borrowing from gnosis):
root.nodes.add_node(XmlNode("node", u"spam")) root.nodes.add_node(XmlNode("node")) root.nodes.node[1].val = u"egg"@J. Ruigrok
I think we’re in agreement that this is the quick-and-easy way to create XML, not the enterprise-ready way.
To me, creating XML in Python is a big pain point. With any of the tools out there that I know of, it gets painful and messy really quickly. Sometimes I just want something simple and relatively painless. I could see using lxml or the like under the hood, but I explicitly don’t want to add a lot of complexity to the interface.
As already said on http://stackoverflow.com/questions/418497/how-do-i-convert-xml-to-nested-objects#419232 just have a look at http://codespeak.net/lxml/objectify.html it is really great:
>>> xml = “”"<main>
… <object1 attr=”name”>content”</object1>
… <object1 attr=”foo”>contenbar”</object1>
… <test>me”</test>
… </main>”"”
>>> from lxml import objectify
>>> main = objectify.fromstring(xml)
>>> main.object1[0]
‘content’
>>> main.object1[1]
‘contenbar’
>>> main.object1[0].get(“attr”)
‘name’
>>> main.test
‘me’
@Peter
Thanks for the link. I didn’t know you could use lxml.objectify to create xml as well. But testing it out, lxml seems to be adding type annotations and schema info (I’m using 2.2 beta 1). Is there a way to suppress that?
“…<name py:pytype=”str”>Ryan</name>…”
You can strip annotations using objectify.deannotate():
>>> msg = objectify.Element(‘msg’)
msg = None [ObjectifiedElement]
>>> msg.s = “somestring”
>>> print etree.tostring(msg, pretty_print=True)
somestring
>>> objectify.deannotate(msg) # strip type annotations
>>> print etree.tostring(msg, pretty_print=True)
somestring
>>> # maybe also strip unused namespaces
>>> etree.cleanup_namespaces(msg)
>>> print etree.tostring(msg, pretty_print=True)
somestring
>>>
@Holger
Thanks!