Don’t overuse classes in Python

Unlike some mainstream languages like Java, you don't have to package everything into a class in Python. A class is a good tool when you want to package up state and behavior, but when all you've got is a bundle of related functionality, the module is the natural unit of packaging in Python.

In my opinion, this article is an egregious example of overuse of classes. I don't want to pick on the author in particular, but it illustrates my point so well that I want to examine the article's code here.

The article was about using Python for exploratory programming, but I think that the class-heavy style makes things more complicated than they need to be. The classes in the code essentially have no state. The one exception is TopRowsWBZipContent, where state is passed into the __init__ method, but is only used in one method and could just as easily have been passed in there. The author also uses extensive inheritance to get the various methods onto the class instances, where if vanilla functions were used, that could all be eliminated.

Here, I want to post the code from the article, and below that my rewrite using plain functions.

First, the article's code (I've made some of the interspersed text into comments):

# Let's look at the first class definition.
# It isn't very interesting, but it shows the design pattern.
class Operation( object ):
    def processList( self, files ):
        for fileName in files:
            self.process( fileName )
    def processFile( self, fileName ):
        pass

# Here's a subclass that provides that process.
class ZipContent( Operation ):
    def processFile( self, fileName ):
        zip= zipfile.ZipFile( fileName )
        for member in zip.infolist():
            print "%s: %s %s" % ( fileName,
                member.filename )
            self.examineMember( zip, member )

# Here's the next subclass.
# It opens each zip archive member as a workbook,
# using the xlrd module.
class WBZipContent( ZipContent ):
    def examineMember( self, zipFile, member ):
        contents= zipFile.read( member.filename )
        wb= xlrd.open_workbook( file_contents=contents,
            filename=member.filename )
        for sheet in wb.sheets():
            self.examineSheet( wb, sheet )
    def examineSheet( self, wb, sheet ):
        print ">  Sheet %s %d rows" % (sheet.name, sheet.nrows )

# Exploring the Workbook sheets
# Here's sprint three of the application.
# This is yet another subclass.
class TopRowsWBZipContent( WBZipContent ):
    def __init__( self, topnRows=5 ):
        super( TopRowsWBZipContent, self ).__init__()
        self.topnRows= topnRows
    def examineSheet( self, wb, sheet ):
        print ">  Sheet %s %d rows" % (sheet.name, sheet.nrows )
        if self.topnRows is None:
            limit= sheet.nrows
        else:
            limit= min( self.topnRows, sheet.nrows )
        for r in xrange(limit):
            row= sheet.row(r)
            print r, [ c.value for c in row ]

def manual():
    """Change the options manually."""
    #op= ZipContent() # What's in the ZIP files?
    # What does the data look like?
    #op= TopRowsWBZipContent( topnRows=5 )
    op= ExtractCSVWBZipContent("../data")
    files = glob.glob( "../data/*.zip" )
    op.processList( files )

Here's my rewrite using vanilla functions. The code is now a lot shorter, and I think easier to understand. (It's also easier to test. Yes, I believe in unit testing exploratory code, at least once it settles down a bit.) I've been a bit snooty and used more Pythonic coding conventions while I was at it.

import glob
import zipfile
import xlrd

def process_files(filenames):
    """Process a list of files."""
    for filename in filenames:
        process_file(filename)

def process_file(filename):
    """
    Examine a zipped file.
    Configure "
examine_member" below to
    customize behavior.
    "
""
    zipped = zipfile.ZipFile(filename)
    for member in zipped.infolist():
        print "%s : %s" % (filename,
            member.filename)
        examine_member(zipped, member)

def examine_workbook(zipped, member):
    """
    Examine a workbook. Open up and process each
    sheet in the workbook using the xlrd module.
    "
""
    contents= zipped.read(member.filename)
    try:
        wb= xlrd.open_workbook(file_contents=contents,
            filename=member.filename)
    except xlrd.biffh.XLRDError:
        print "Not an excel file"
    else:
        for sheet in wb.sheets():
            examine_sheet(wb, sheet)

def examine_sheet(wb, sheet, top_n_rows=5):
    """
    Examine a worksheet. Print top_n_rows, or
    all rows in the sheet if top_n_rows is 0/None.
    "
""
    print ">  Sheet %s %d rows" % (sheet.name,
                            sheet.nrows)
    limit = top_n_rows or sheet.nrows
    for r in xrange(limit):
        row = sheet.row(r)
        print r, [c.value for c in row]

# configure behavior like this
examine_member = examine_workbook

def manual():
    """
    Run when called as main. Gets all
    the zip files in an arbitrary folder
    and processes them.
    "
""
    filenames = glob.glob("stuff/*.zip")
    process_files(filenames)

if __name__ == "__main__":
    manual()

Note that this code is even more suited to exploratory programming than the class-based code, because we don't have to write all the class machinery, and we can mix around functions without the need for inheritance or other abuses.

7 comments to Don’t overuse classes in Python

  • my term for this is “object chauvinism”

  • I fully concur. Both classes and functions (methods) have their function. You need to pick the ones as the situation demands. Just using a class for class sakes is silly.

  • Russell

    Agree on your basic point. Using a class hierarchy seems complicated in a goofy way.

    BtW, I’d “configure” examine_member by having it be a parameter to process_file rather than being a global variable. But that’s probably a detail.

  • @Russell

    You’ve got a point about passing in the examine_member function as a parameter. I was assuming that being exploratory, the code would be evolving as it developed, so I made this as lightweight as possible. By the time I was ready to put this into a module, that would be one of the first refactorings I made.

  • Junius

    For exploratory programming, you might as well take advantage of the interpreter.

    Open your editor, and make like you were going to do everything with module level code, no functions or anything, and paste back and forth with the interpreter to see what you are dealing with.

    You should end up with a file full of notes and python snippets that you can arrange it a reasonable structure to get your final code.

    But you haven’t been nearly snooty enough. You should have gone all the way and written it using generators.

    Generators are a convenient way to avoid deeply nested call graphs like main -> process_files -> process_file -> examine_workbook -> examine_sheet.

    Rather, make each function a generator that yields the next stage’s arguments to its caller, and then and link the stages together in main.

    For example, in process_file, there’s no need to configure examine_member. Rather, make process_file a generator that yields the file’s members to its caller, and the caller decides what function to use to process them.

    This way, all the functions are independent of each other and can be understood and tested in isolation.

  • @Junius

    Thanks for the comment. You make some good points. I especially like the advice about using generators.

    Certainly, my code could have been made better, and I’d have written it differently if going about it from scratch. But that wasn’t my point; my point was to argue against overuse of classes. I prefer making a narrow point (don’t overuse classes; do this instead) than a broad one (here’s some bad code; this is better). But that’s probably because I’m a bit simple minded.

    I also think exploring code in a script is a fine alternative to the interpreter, especially if you know you want to end up with a module, but you’re just not sure yet what it should do. :) And especially if you’re writing unit tests in tandem.

  • Mounir Errami

    I agree with the main point. A lot of coders uses a Java way of coding with Python.
    However, in some circumstances, although a class may not be needed, I sometimes decide to use a class whenever one or more variables need be shared by different functions that are related. This avoids the trouble of defining global variables or passing numerous arguments.
    Besides, and although I have little to substantiate this claim, I find it easier to work with classes when building event driven (graphic) applications, it helps (at least for me) in respect of the Model-View-Controller. But this is not very related to the original post.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>