An easy way to write XML in Python

David Mertz's gnosis utilities include the module gnosis.xml.objectify, which makes parsing XML in python as simple as could be.

from gnosis.xml import objectify

xmltext = """<?xml version="1.0" encoding="UTF-8"?>
<root><first canned="
true" yummy="false">spam</first><second>egg</second></root>"""

inst = objectify.make_instance(xmltext)
print "first:", inst.first.PCDATA
print "  canned:", inst.first.canned
print "  yummy:", inst.first.yummy
print "second:", inst.second.PCDATA

Output:

first: spam
  canned: true
  yummy: false
second: egg

I wanted a similarly simple way to write XML, so I wrote a little module named xmlwriter (zip file) to do it.

Usage:

from xmlwriter import XmlNode, xmlify
from StringIO import StringIO

root = XmlNode(u"root")

first = root.first
first.val = u"spam"
first["yummy"] = u"false"
first["canned"] = u"true"

root.second.val = u"egg"

out = StringIO()
xmlify(root, out)
print  out.getvalue()

Output:

<?xml version="1.0" encoding="UTF-8"?>
<root><first canned="true" yummy="false">spam</first><second>egg</second></root>

It's still pretty basic. For example, you can't have sibling nodes with the same tag name. But it's a very simple way to do some down and dirty XML writing.

Here's the xmlwriter module code.

from StringIO import StringIO

class XmlNode(object):
    """Pythonic representation of an XML node.

    Expects values and attributes to be in Unicode

    Example:
    root = XmlNode(u"root")
    root.node.val = u"
value"
    root["
attr"] = u"name"
    "
""

    def __init__(self, tag=None, value=None):
        self._tag = tag
        self.val = value
        self._attrs = {}

    def __getattr__(self, attr):
        """Add nodes on the fly"""
        self.__dict__[attr] = XmlNode(attr)
        return self.__dict__[attr]

    # dictionary access
    def __getitem__(self, key):
        return self._attrs[key]
    def __setitem__(self, key, val):
        self._attrs[key] = val

def write_open_tag(node, out):
    """
    Writes the opening tag for node, including attributes
    out is a file-like object
    "
""
    out.write("<%s" % node._tag)
    out.write("".join([' %s="%s"' % (k, v.encode("utf-8"))
                              for k, v in node._attrs.items()]))
    out.write(">")

def xmlify(root, out):
    """
    Takes the root, and recursively goes
    down printing out the tags, attributes, and values.
    out is a file-like object
    "
""

    # write XML header if this is the first node
    if not out.pos:
        print >> out, """<?xml version="1.0" encoding="UTF-8"?>"""

    # opening tag
    write_open_tag(root, out)

    # value
    if root.val:
        out.write(root.val.encode("utf-8"))

    # sub-nodes
    for item in dir(root):
        attr = getattr(root, item)
        if isinstance(attr, XmlNode):
            xmlify(attr, out)

    # closing tag
    out.write("</%s>" % root._tag)

11 comments to An easy way to write XML in Python

  • Adam

    This has the potential to be useful, however you need to put some escaping in for adding the text nodes.

    From your example:

    first.val = u”This contains characters”

    Should escape to >

    I wrote a small wrapper around libxml2 to write XML in a serial manner, source code is available at http://www.newmediascientist.com/read.html?item=quick-and-easy-xml-creation

  • I think you’ve kind of proven your point, because the comment monster seems to have eaten your angle brackets.

    There are lots of options for escaping XML entities in Python. Here’s a page on it from the Python wiki. I’m not entirely convinced that the module should be escaping xml, but I agree that it would be simpler for the user.

    I’m sure the solution you wrote is more correct, but pardon me for saying — it doesn’t look incredibly simple to use.

    I just want to do:

    prefs = XmlNode("prefs")
    prefs.language.val = u"English"
    prefs.user.name.val = u"Ryan"
    prefs.user.email.val = u"ryan@example.com"
    prefs.user["id"] = u"123"
    prefs.onions.val = u"extra"
  • Adam

    So it has, I didn’t notice the comment getting mangled.

    I think one of the main problems is that the XML standard is easy to understand the basics but the full specification is pretty complicated. I used libxml2 as the main engine since it’s pretty much watertight being the Gnome XML library.

    I think you may run into limitations quite quickly using instances as nodes. For example, how would you generate an XHTML list?

    div = XmlNode(“container”)
    div.ul.li = “First Item”
    div.ul.li = “About to overwrite the first item?”

    I tinkered with serializing Python objects to XML a while back and came to the conclusion that the limitations outweigh the usefulness.

  • You’re definitely right that things like lists are a big weakness in the module. I’ll probably try to add ways to handle that at some point.

    But then again, for a complex document I’m more likely to use a templating language than to generate the XML tree in code. I guess that’s just me, though. 🙂

  • Strange, my comment didn’t go through I guess.

    Anyway, I pointed out that there’s lxml and ElementTree (etree) that might be useful for you without having to roll your own solutions like this.

    Since 2.5 Python has xml.etree, but there’s still an external version being updated and maintained.

  • Kamil Kisiel

    How about the following for lists:

    div.ul.li = [“first item”, “second item”]

    Of course that only supports a homogeneous array of child tags.

  • @Kamil

    The problem with your proposal is that you can’t override the assignment operator in Python. Thus you’d need some serious voodoo to stick children under those li nodes.

    I’m thinking of syntax along these lines (borrowing from gnosis):

    root.nodes.add_node(XmlNode("node", u"spam"))
    root.nodes.add_node(XmlNode("node"))
    root.nodes.node[1].val = u"egg"

    @J. Ruigrok

    I think we’re in agreement that this is the quick-and-easy way to create XML, not the enterprise-ready way.

    To me, creating XML in Python is a big pain point. With any of the tools out there that I know of, it gets painful and messy really quickly. Sometimes I just want something simple and relatively painless. I could see using lxml or the like under the hood, but I explicitly don’t want to add a lot of complexity to the interface.

  • Peter

    As already said on http://stackoverflow.com/questions/418497/how-do-i-convert-xml-to-nested-objects#419232 just have a look at http://codespeak.net/lxml/objectify.html it is really great:

    >>> xml = “””<main>
    … <object1 attr=”name”>content”</object1>
    … <object1 attr=”foo”>contenbar”</object1>
    … <test>me”</test>
    … </main>”””

    >>> from lxml import objectify

    >>> main = objectify.fromstring(xml)

    >>> main.object1[0]
    ‘content’

    >>> main.object1[1]
    ‘contenbar’

    >>> main.object1[0].get(“attr”)
    ‘name’

    >>> main.test
    ‘me’

  • @Peter

    Thanks for the link. I didn’t know you could use lxml.objectify to create xml as well. But testing it out, lxml seems to be adding type annotations and schema info (I’m using 2.2 beta 1). Is there a way to suppress that?

    “…<name py:pytype=”str”>Ryan</name>…”

  • Holger Joukl

    You can strip annotations using objectify.deannotate():

    >>> msg = objectify.Element(‘msg’)
    msg = None [ObjectifiedElement]
    >>> msg.s = “somestring”
    >>> print etree.tostring(msg, pretty_print=True)

    somestring

    >>> objectify.deannotate(msg) # strip type annotations
    >>> print etree.tostring(msg, pretty_print=True)

    somestring

    >>> # maybe also strip unused namespaces
    >>> etree.cleanup_namespaces(msg)
    >>> print etree.tostring(msg, pretty_print=True)

    somestring

    >>>

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>