Don’t overuse classes in Python
Unlike some mainstream languages like Java, you don't have to package everything into a class in Python. A class is a good tool when you want to package up state and behavior, but when all you've got is a bundle of related functionality, the module is the natural unit of packaging in Python.
In my opinion, this article is an egregious example of overuse of classes. I don't want to pick on the author in particular, but it illustrates my point so well that I want to examine the article's code here.
The article was about using Python for exploratory programming, but I think that the class-heavy style makes things more complicated than they need to be. The classes in the code essentially have no state. The one exception is TopRowsWBZipContent, where state is passed into the __init__ method, but is only used in one method and could just as easily have been passed in there. The author also uses extensive inheritance to get the various methods onto the class instances, where if vanilla functions were used, that could all be eliminated.
Here, I want to post the code from the article, and below that my rewrite using plain functions.
First, the article's code (I've made some of the interspersed text into comments):
# It isn't very interesting, but it shows the design pattern.
class Operation( object ):
def processList( self, files ):
for fileName in files:
self.process( fileName )
def processFile( self, fileName ):
pass
# Here's a subclass that provides that process.
class ZipContent( Operation ):
def processFile( self, fileName ):
zip= zipfile.ZipFile( fileName )
for member in zip.infolist():
print "%s: %s %s" % ( fileName,
member.filename )
self.examineMember( zip, member )
# Here's the next subclass.
# It opens each zip archive member as a workbook,
# using the xlrd module.
class WBZipContent( ZipContent ):
def examineMember( self, zipFile, member ):
contents= zipFile.read( member.filename )
wb= xlrd.open_workbook( file_contents=contents,
filename=member.filename )
for sheet in wb.sheets():
self.examineSheet( wb, sheet )
def examineSheet( self, wb, sheet ):
print "> Sheet %s %d rows" % (sheet.name, sheet.nrows )
# Exploring the Workbook sheets
# Here's sprint three of the application.
# This is yet another subclass.
class TopRowsWBZipContent( WBZipContent ):
def __init__( self, topnRows=5 ):
super( TopRowsWBZipContent, self ).__init__()
self.topnRows= topnRows
def examineSheet( self, wb, sheet ):
print "> Sheet %s %d rows" % (sheet.name, sheet.nrows )
if self.topnRows is None:
limit= sheet.nrows
else:
limit= min( self.topnRows, sheet.nrows )
for r in xrange(limit):
row= sheet.row(r)
print r, [ c.value for c in row ]
def manual():
"""Change the options manually."""
#op= ZipContent() # What's in the ZIP files?
# What does the data look like?
#op= TopRowsWBZipContent( topnRows=5 )
op= ExtractCSVWBZipContent("../data")
files = glob.glob( "../data/*.zip" )
op.processList( files )
Here's my rewrite using vanilla functions. The code is now a lot shorter, and I think easier to understand. (It's also easier to test. Yes, I believe in unit testing exploratory code, at least once it settles down a bit.) I've been a bit snooty and used more Pythonic coding conventions while I was at it.
import zipfile
import xlrd
def process_files(filenames):
"""Process a list of files."""
for filename in filenames:
process_file(filename)
def process_file(filename):
"""
Examine a zipped file.
Configure "examine_member" below to
customize behavior.
"""
zipped = zipfile.ZipFile(filename)
for member in zipped.infolist():
print "%s : %s" % (filename,
member.filename)
examine_member(zipped, member)
def examine_workbook(zipped, member):
"""
Examine a workbook. Open up and process each
sheet in the workbook using the xlrd module.
"""
contents= zipped.read(member.filename)
try:
wb= xlrd.open_workbook(file_contents=contents,
filename=member.filename)
except xlrd.biffh.XLRDError:
print "Not an excel file"
else:
for sheet in wb.sheets():
examine_sheet(wb, sheet)
def examine_sheet(wb, sheet, top_n_rows=5):
"""
Examine a worksheet. Print top_n_rows, or
all rows in the sheet if top_n_rows is 0/None.
"""
print "> Sheet %s %d rows" % (sheet.name,
sheet.nrows)
limit = top_n_rows or sheet.nrows
for r in xrange(limit):
row = sheet.row(r)
print r, [c.value for c in row]
# configure behavior like this
examine_member = examine_workbook
def manual():
"""
Run when called as main. Gets all
the zip files in an arbitrary folder
and processes them.
"""
filenames = glob.glob("stuff/*.zip")
process_files(filenames)
if __name__ == "__main__":
manual()
Note that this code is even more suited to exploratory programming than the class-based code, because we don't have to write all the class machinery, and we can mix around functions without the need for inheritance or other abuses.

my term for this is “object chauvinism”
I fully concur. Both classes and functions (methods) have their function. You need to pick the ones as the situation demands. Just using a class for class sakes is silly.
Agree on your basic point. Using a class hierarchy seems complicated in a goofy way.
BtW, I’d “configure” examine_member by having it be a parameter to process_file rather than being a global variable. But that’s probably a detail.
@Russell
You’ve got a point about passing in the
examine_memberfunction as a parameter. I was assuming that being exploratory, the code would be evolving as it developed, so I made this as lightweight as possible. By the time I was ready to put this into a module, that would be one of the first refactorings I made.For exploratory programming, you might as well take advantage of the interpreter.
Open your editor, and make like you were going to do everything with module level code, no functions or anything, and paste back and forth with the interpreter to see what you are dealing with.
You should end up with a file full of notes and python snippets that you can arrange it a reasonable structure to get your final code.
But you haven’t been nearly snooty enough. You should have gone all the way and written it using generators.
Generators are a convenient way to avoid deeply nested call graphs like main -> process_files -> process_file -> examine_workbook -> examine_sheet.
Rather, make each function a generator that yields the next stage’s arguments to its caller, and then and link the stages together in main.
For example, in process_file, there’s no need to configure examine_member. Rather, make process_file a generator that yields the file’s members to its caller, and the caller decides what function to use to process them.
This way, all the functions are independent of each other and can be understood and tested in isolation.
@Junius
Thanks for the comment. You make some good points. I especially like the advice about using generators.
Certainly, my code could have been made better, and I’d have written it differently if going about it from scratch. But that wasn’t my point; my point was to argue against overuse of classes. I prefer making a narrow point (don’t overuse classes; do this instead) than a broad one (here’s some bad code; this is better). But that’s probably because I’m a bit simple minded.
I also think exploring code in a script is a fine alternative to the interpreter, especially if you know you want to end up with a module, but you’re just not sure yet what it should do.
And especially if you’re writing unit tests in tandem.
I agree with the main point. A lot of coders uses a Java way of coding with Python.
However, in some circumstances, although a class may not be needed, I sometimes decide to use a class whenever one or more variables need be shared by different functions that are related. This avoids the trouble of defining global variables or passing numerous arguments.
Besides, and although I have little to substantiate this claim, I find it easier to work with classes when building event driven (graphic) applications, it helps (at least for me) in respect of the Model-View-Controller. But this is not very related to the original post.