Encoding an output stream in Python

Sometimes you may need to encode an output stream. Maybe you're writing to a file and you need it in a certain encoding, or to cStringIO.StringIO, which only takes byte strings. Or basically when you're dealing with the dreaded UnicodeEncodeError.

You can avoid this by wrapping your output stream in a class (i.e. a decorator) that transparently encodes Unicode strings into the expected encoding.

import sys

class OutStreamEncoder(object):
    """Wraps a stream with an encoder
    "
""

    def __init__(self, outstream, encoding=None):
        self.out = outstream
        if not encoding:
            self.encoding = sys.getfilesystemencoding()
        else:
            self.encoding = encoding

    def write(self, obj):
        """Wraps the output stream, encoding Unicode
        strings with the specified encoding"
""

        if isinstance(obj, unicode):
            self.out.write(obj.encode(self.encoding))
        else:
            self.out.write(obj)

    def __getattr__(self, attr):
        """Delegate everything but write to the stream"""

        return getattr(self.out, attr)

An example use:

>>> from cStringIO import StringIO as si
>>> out = si()
>>> nihongo = unicode("日本語", "sjis")
>>> print >> out, nihongo

Traceback (most recent call last):
  File "<pyshell#40>", line 1, in <module>
    print >> out, nihongo
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:
ordinal not in range(128)
>>> out = OutStreamEncoder(out, "utf-8")
>>> print >> out, nihongo
>>> val = out.getvalue()
>>> print val.decode("utf-8")
日本語

>>>

The linked zip file (streamencode.zip) has a streamencode module with the OutStreamEncoder class, and a unit test module.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>