Iterating over a window in Python

This weekend, I had a fairly common problem: given a sequence of elements and an element in that sequence, I needed to get the next element; if the element was the last in the sequence, then I needed the first element.

My first stab at a solution was pretty straightforward:

def item_after(elements, item):
    for i in range(len(elements)):
        if elements[i] == item:
            try:
                return elements[i+1]
            except IndexError:
                return elements[0]

This works for the intended purpose, although the C-style looping doesn't sit too well with me. But this code has a major flaw: it doesn't work for generators.

After a little more thought, I decided that what I wanted was to iterate over a window of two items in the sequence, and if the first item matched, return the second one.

from itertools import chain

def by_pairs(iterable):
    sequence = iter(iterable)
    previous = sequence.next()
    while True:
        current = sequence.next()
        yield previous, current
        previous = current

To illustrate how this works:

>>> for pair in by_pairs(range(6)):
    print pair

(0, 1)
(1, 2)
(2, 3)
(3, 4)
(4, 5)

The new item_after function:

def item_after2(elements, item):
    pairs = by_pairs(elements)
    first, second = pairs.next()
    for a, b in chain([(first, second)], pairs):
        if a == item:
            return b
    return first

In the new item_after function, I peel off the first pair so I can cache the first element, then glob it back on the front of the sequence using itertools.chain.

After a little more thought, I realized that the by_pairs function can be made generic to handle a window of any size.

from collections import deque

def window_iter(elements, n):
    sequence = iter(elements)
    window = deque()
    while True:
        element = sequence.next()
        window.append(element)
        if len(window) == n:
            yield window
            window.popleft()

The item_after function now looks like this:

def item_after3(elements, item):
    pairs = window_iter(elements, 2)
    first, second = pairs.next()
    for a, b in chain([(first, second)], pairs):
        if a == item:
            return b
    return first

And with a little data: one of my pets gets a treat every day. I know who got the treat yesterday; who gets it next?

pets = [pet for pet in "Bear Lady Mikey Jerry Kiri".split()]
print "Pet after Kiri is", item_after3(pets, "Kiri")
print "Pet after Lady is", item_after3(pets, "Lady")

The output:

Pet after Kiri is Bear
Pet after Lady is Mikey

I can think of a lot of potential uses for this window iterator. One is in natural language processing, when calculating the "stickiness" of words in a corpus. You could iterate over a window of three words, calculating the collocations of words before and after the middle word.

Here's a rather contrived example: calculating the average temperature on days when the previous two days were 80 degrees or higher.

def ave_temp_after(temp, temps):
    after_temps = [z for x, y, z in window_iter(temps, 3)
                   if x > temp < y]
    return sum(after_temps) / float(len(after_temps))

tempdata = "99 101 85 91 71 64 67 90 81 101".split()
temps = (float(x) for x in tempdata)
temp = ave_temp_after(80, temps)

print "Average temperature after two days of 80+:", temp

Output:

Average temperature after two days of 80+: 87.0

It strikes me that this would be great for those baseball statistics guys:

This is just the third time since 1963 that a left-handed pinch hitter has broken a bat on three consecutive Wednesdays.

I'm waiting for a call from The Show any day now.

3 comments to Iterating over a window in Python

  • Jon-Eric

    itertools.tee might be useful in this case.

    From http://docs.python.org/library/itertools.html#recipes :

    def pairwise(iterable):
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        for elem in b:
            break
        return izip(a, b)

    While the for loop (to advance b) is certainly concise, I don’t find it too readable. I think it demands a comment at least.

  • That’s a good one. Generalizing to create a window of arbitrary size:

    from itertools import tee, izip
    
    def window_iter(sequence, n):
        """
        s -> (s1, ... sn), (s2, ... sn+1), ...
        """
        sequences = tee(sequence, n)
        for it, num in zip(sequences, range(n)):
            for i in range(num):
                it.next()
    
        return izip(*list(sequences))

    Example:

    >>> s = (1, 2, 3, 4, 5, 6)
    >>> for sequence in window_iter(s, 3):
    	print sequence
    
    
    (1, 2, 3)
    (2, 3, 4)
    (3, 4, 5)
    (4, 5, 6)

    I give it plus points for not using an accumulator, and minus points for resorting to “star” magic.

  • goomish

    How about this?
    >>> a
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
    >>> map(lambda x,y:(x,y),a[:-1],a[1:])
    [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 0)]
    >>>

    I’m new to python so forgive me if it’s silly example 😉

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>