Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Working with Iterators in Python

Iterator vs Generator

Generator is a special case of Iterator. Generator is easy and convenient to use but at additional cost (memory and speed). If you need performance, use plain iterator (with the help of the itertools module). If you need convenience and concise code, use generator.

Please refer to Python Generator vs Iterator for more detailed discussions.

When Not to Use Generator

  1. You need to access the data multiple times (i.e. cache the results instead of recomputing them).

     for i in outer:           # used once, okay to be a generator or return a list
         for j in inner:       # used multiple times, reusing a list is better
             ...
  2. You need random access (or any access other than forward sequential order).

     for i in reversed(data): ...     # generators aren't reversible
    
     s[i], s[j] = s[j], s[i]          # generators aren't indexable
  3. You need to join strings (which requires two passes over the data).

     s = ''.join(data)                # lists are faster than generators in this use case
  4. You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.

In [1]:
import itertools as it

builtins.enumerate

In [1]:
words = ["how", "are", "you"]
for idx, word in enumerate(words):
    print(f"{idx}: {word}")
0: how
1: are
2: you
In [2]:
words = ["how", "are", "you"]
for idx, word in enumerate(words, 10):
    print(f"{idx}: {word}")
10: how
11: are
12: you

builtins.all

In [4]:
all([True, True, True, True])
Out[4]:
True
In [5]:
all([True, False, True, True])
Out[5]:
False

builtins.any

In [6]:
any([True, True, True, True])
Out[6]:
True
In [7]:
any([True, False, True, True])
Out[7]:
True
In [8]:
any([False, False, False, False])
Out[8]:
False

builtins.max

In [1]:
max([1, 2, 3], default=0)
Out[1]:
3
In [2]:
max([])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-a48d8f8c12de> in <module>
----> 1 max([])

ValueError: max() arg is an empty sequence
In [4]:
max([], default=0)
Out[4]:
0

itertools.accumulate

In [86]:
import operator

list(it.accumulate([1, 2, 3, 4], operator.mul))
Out[86]:
[1, 2, 6, 24]
In [87]:
import operator

list(it.accumulate([1, 2, 3, 4], operator.add))
Out[87]:
[1, 3, 6, 10]

count (range)

Notice that count does NOT count the number of elements in an iterator, but instead generate an iterator of arithematic sequence of integers.

In [3]:
x = it.count(0, 1)
list(it.takewhile(lambda x: x < 10, x))
Out[3]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

cycle

In [9]:
x = it.cycle("abcd")
list(it.islice(x, 0, 10))
Out[9]:
['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'b']

itertools.chain

You can use itertools.chain to combine/join multiple iterable objects.

In [13]:
x = it.chain("abcd", "efgh", "ijkl")
list(x)
Out[13]:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
In [ ]:
it.chain((1, 2), [3, 4])
In [18]:
it.chain(*[[1, 2], [3, 4]])
Out[18]:
<itertools.chain at 0x7f713aeeb358>
In [19]:
list(it.chain(*[[1, 2], [3, 4]]))
Out[19]:
[1, 2, 3, 4]
In [16]:
it.chain.from_iterable([[1, 2], [3, 4]])
Out[16]:
<itertools.chain at 0x7f713a4f8438>
In [17]:
list(it.chain.from_iterable([[1, 2], [3, 4]]))
Out[17]:
[1, 2, 3, 4]

compress

In [20]:
x = it.compress([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 1, 1, 0, 0, 1, 1, 0, 0, 1])
list(x)
Out[20]:
[0, 1, 2, 5, 6, 9]

dropwhile

In [22]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(it.dropwhile(lambda i: i < 5, x))
Out[22]:
[5, 6, 7, 8, 9]
In [24]:
x = [0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
list(it.dropwhile(lambda i: i < 5, x))
Out[24]:
[5, 4, 3, 2, 1]
In [23]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(it.dropwhile(lambda i: i % 2 == 0, x))
Out[23]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

combinations

In [60]:
x = [0, 1, 2, 3, 4]
list(it.combinations(x, 3))
Out[60]:
[(0, 1, 2),
 (0, 1, 3),
 (0, 1, 4),
 (0, 2, 3),
 (0, 2, 4),
 (0, 3, 4),
 (1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4)]

combinations_with_replacement

In [64]:
x = [0, 1, 2, 3]
list(it.combinations_with_replacement(x, 3))
Out[64]:
[(0, 0, 0),
 (0, 0, 1),
 (0, 0, 2),
 (0, 0, 3),
 (0, 1, 1),
 (0, 1, 2),
 (0, 1, 3),
 (0, 2, 2),
 (0, 2, 3),
 (0, 3, 3),
 (1, 1, 1),
 (1, 1, 2),
 (1, 1, 3),
 (1, 2, 2),
 (1, 2, 3),
 (1, 3, 3),
 (2, 2, 2),
 (2, 2, 3),
 (2, 3, 3),
 (3, 3, 3)]

groupby

itertools.groupby is a convenient way to group elements into groups of elements. Its return type is Iterable[Tuple[KeyType, Iterable[ValueType]]]. Notice that itertools.groupby iterate through the original collection only once, so you must consume the result of itertools.groupby all at one time (convert all iterators at one time).

In [3]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
it.groupby(x, lambda e: e // 3)
Out[3]:
<itertools.groupby at 0x12621f2c0>

The following way of consuming the resutl of itertools.groupby is not right as it does not convert all iterators at one time.

In [4]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groups = it.groupby(x, lambda e: e // 3)
groups = list(groups)
groups
Out[4]:
[(0, <itertools._grouper at 0x10d67f550>),
 (1, <itertools._grouper at 0x10d67fd90>),
 (2, <itertools._grouper at 0x10d67f700>),
 (3, <itertools._grouper at 0x10d67f250>)]
In [5]:
groups[0]
Out[5]:
(0, <itertools._grouper at 0x10d67f550>)

Got no values for the key 0 as the iterator as already been consumed the first time it.groupby(x, lambda e: e // 3) is converted to a list.

In [6]:
list(groups[0][1])
Out[6]:
[]

Below is the right way to convert the result of itertools.groupby to a list of tuples.

In [8]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groups = it.groupby(x, lambda e: e // 3)
[(key, list(val)) for key, val in groups]
Out[8]:
[(0, [0, 1, 2]), (1, [3, 4, 5]), (2, [6, 7, 8]), (3, [9])]

Below is the right way to convert the result of itertools.groupby to a dict.

In [9]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groups = it.groupby(x, lambda e: e // 3)
{key: list(val) for key, val in groups}
Out[9]:
{0: [0, 1, 2], 1: [3, 4, 5], 2: [6, 7, 8], 3: [9]}

Below is a the right way to extract groups of values (without keys).

In [10]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groups = it.groupby(x, lambda e: e // 3)
[list(val) for _, val in groups]
Out[10]:
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Notice that you can also use itertools.zip_longest to group values into equal size of groups according to the iteration order of elements.

In [4]:
x = "abcdefghij"
myit = it.zip_longest(*([iter(x)] * 3))
[list(val) for val in myit]
Out[4]:
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i'], ['j', None, None]]

filter

In [50]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(filter(lambda e: e % 2 == 0, x))
Out[50]:
[0, 2, 4, 6, 8]

filterfalse

In [82]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(it.filterfalse(lambda e: e % 2 == 0, x))
Out[82]:
[1, 3, 5, 7, 9]

from_iterable

In [55]:
list(it.chain.from_iterable(["abcd", "efgh", "ijkl"]))
Out[55]:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']

islice

In [78]:
list(it.islice("abcd", 0, 2))
Out[78]:
['a', 'b']
In [79]:
list(it.islice("abcd", 2, 2))
Out[79]:
[]
In [80]:
list(it.islice("abcd", 2, 10))
Out[80]:
['c', 'd']

next

next (when used with a conditon) is equivalent to Seq.find in Scala. A default value can be passed to next.

In [12]:
def fibonacci(x1, x2):
    while True:
        yield x1
        x1, x2 = x2, x1 + x2


fiter = fibonacci(1, 1)
next(fiter)
Out[12]:
1
In [13]:
fiter = fibonacci(1, 1)
next(f for f in fiter if f > 100)
Out[13]:
144
In [11]:
fiter = fibonacci(1, 1)
next((f for f in fiter if f > 100), 100)
Out[11]:
144

slice

In [3]:
slice(10)
Out[3]:
slice(None, 10, None)
In [1]:
x = [1, 2, 3, 4]
x[0:2]
Out[1]:
[1, 2]
In [2]:
x = [1, 2, 3, 4]
x[slice(2)]
Out[2]:
[1, 2]

map

In [74]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(map(lambda e: e * e, x))
Out[74]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

permutations

In [62]:
x = [0, 1, 2, 3, 4]
list(it.permutations(x, 2))
Out[62]:
[(0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (1, 0),
 (1, 2),
 (1, 3),
 (1, 4),
 (2, 0),
 (2, 1),
 (2, 3),
 (2, 4),
 (3, 0),
 (3, 1),
 (3, 2),
 (3, 4),
 (4, 0),
 (4, 1),
 (4, 2),
 (4, 3)]

product

In [67]:
list(it.product("abcd", "1234"))
Out[67]:
[('a', '1'),
 ('a', '2'),
 ('a', '3'),
 ('a', '4'),
 ('b', '1'),
 ('b', '2'),
 ('b', '3'),
 ('b', '4'),
 ('c', '1'),
 ('c', '2'),
 ('c', '3'),
 ('c', '4'),
 ('d', '1'),
 ('d', '2'),
 ('d', '3'),
 ('d', '4')]

repeat

In [12]:
x = it.repeat("abcd")
list(it.islice(x, 0, 10))
Out[12]:
['abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd',
 'abcd']

starmap

In [76]:
x = [[0, 1], [2, 3], [4, 5]]
list(it.starmap(pow, x))
Out[76]:
[0, 8, 1024]

takewhile

In [68]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(it.takewhile(lambda i: i < 5, x))
Out[68]:
[0, 1, 2, 3, 4]
In [69]:
x = [0, 1, 2, 3, 4, 5, 4, 3, 2, 1]
list(it.takewhile(lambda i: i < 5, x))
Out[69]:
[0, 1, 2, 3, 4]
In [70]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(it.takewhile(lambda i: i % 2 == 0, x))
Out[70]:
[0]

tee

In [72]:
it1, it2 = it.tee("abcd")
print(list(it1))
print(list(it2))
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd']

zip_longest

In [83]:
list(it.zip_longest("abc", [1, 2, 3, 4, 5]))
Out[83]:
[('a', 1), ('b', 2), ('c', 3), (None, 4), (None, 5)]

Empty Iterator or Not

In [2]:
it = [1, 2, 3]
not any(True for _ in it)
Out[2]:
False
In [3]:
it = []
not any(True for _ in it)
Out[3]:
True

Iterator Receipes

In [1]:
def take(n, iterable):
    "Return first n items of the iterable as a list"
    return list(islice(iterable, n))


def tabulate(function, start=0):
    "Return function(0), function(1), ..."
    return map(function, count(start))


def tail(n, iterable):
    "Return an iterator over the last n items"
    # tail(3, 'ABCDEFG') --> E F G
    return iter(collections.deque(iterable, maxlen=n))


def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)


def nth(iterable, n, default=None):
    "Returns the nth item or a default value"
    return next(islice(iterable, n, None), default)


def all_equal(iterable):
    "Returns True if all the elements are equal to each other"
    g = groupby(iterable)
    return next(g, True) and not next(g, False)


def quantify(iterable, pred=bool):
    "Count how many times the predicate is true"
    return sum(map(pred, iterable))


def padnone(iterable):
    """Returns the sequence elements and then returns None indefinitely.

    Useful for emulating the behavior of the built-in map() function.
    """
    return chain(iterable, repeat(None))


def ncycles(iterable, n):
    "Returns the sequence elements n times"
    return chain.from_iterable(repeat(tuple(iterable), n))


def dotproduct(vec1, vec2):
    return sum(map(operator.mul, vec1, vec2))


def flatten(listOfLists):
    "Flatten one level of nesting"
    return chain.from_iterable(listOfLists)


def repeatfunc(func, times=None, *args):
    """Repeat calls to func with specified arguments.

    Example:  repeatfunc(random.random)
    """
    if times is None:
        return starmap(func, repeat(args))
    return starmap(func, repeat(args, times))


def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)


def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)


def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))


def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)


def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))


def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element


def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return map(next, map(itemgetter(1), groupby(iterable, key)))


def iter_except(func, exception, first=None):
    """Call a function repeatedly until an exception is raised.

    Converts a call-until-exception interface to an iterator interface.
    Like builtins.iter(func, sentinel) but uses an exception instead
    of a sentinel to end the loop.

    Examples:
        iter_except(functools.partial(heappop, h), IndexError)   # priority queue iterator
        iter_except(d.popitem, KeyError)                         # non-blocking dict iterator
        iter_except(d.popleft, IndexError)                       # non-blocking deque iterator
        iter_except(q.get_nowait, Queue.Empty)                   # loop over a producer Queue
        iter_except(s.pop, KeyError)                             # non-blocking set iterator

    """
    try:
        if first is not None:
            yield first()  # For database APIs needing an initial cast to db.first()
        while True:
            yield func()
    except exception:
        pass


def first_true(iterable, default=False, pred=None):
    """Returns the first true value in the iterable.

    If no true value is found, returns *default*

    If *pred* is not None, returns the first item
    for which pred(item) is true.

    """
    # first_true([a,b,c], x) --> a or b or c or x
    # first_true([a,b], x, f) --> a if f(a) else b if f(b) else x
    return next(filter(pred, iterable), default)


def random_product(*args, repeat=1):
    "Random selection from itertools.product(*args, **kwds)"
    pools = [tuple(pool) for pool in args] * repeat
    return tuple(random.choice(pool) for pool in pools)


def random_permutation(iterable, r=None):
    "Random selection from itertools.permutations(iterable, r)"
    pool = tuple(iterable)
    r = len(pool) if r is None else r
    return tuple(random.sample(pool, r))


def random_combination(iterable, r):
    "Random selection from itertools.combinations(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.sample(range(n), r))
    return tuple(pool[i] for i in indices)


def random_combination_with_replacement(iterable, r):
    "Random selection from itertools.combinations_with_replacement(iterable, r)"
    pool = tuple(iterable)
    n = len(pool)
    indices = sorted(random.randrange(n) for i in range(r))
    return tuple(pool[i] for i in indices)
In [ ]:
 

Comments