Ben Chuanlong Du's Blog

And let it direct your passion with reason.

The set Collection in Python

General Tips and Traps

  1. The set class is implemented based on hash table which means that its elements must be hashable (has methods __hash__ and __eq__). The set class implements the mathematical concepts of set which means that its elements are unordered and does not perserve insertion order of elements. Notice that this is different from the dict class which is also implemented based on hash table but keeps insertion order of elements! The article Why don't Python sets preserve insertion order? has a good explanation of the rational behind this.

  2. The 3rd-party Python library sortedcontainers has implementations of sorted containers. Specially, there is a class named SortedSet.

Construct a Set

  1. set accepts a single iterable argument instead of varargs!
    • set("abc") creates set with elements "a", "b" and "c" instead of a set with a single element "abc"!
    • set(1, 2, 3) won't create a set of elements 1, 2 and 3 but will instead throw TypeErro as set requires a single iterable argument.
In [14]:
set("abc")
Out[14]:
{'a', 'b', 'c'}
In [15]:
set(1, 2, 3)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-1dd7caaf970a> in <module>
----> 1 set(1, 2, 3)

TypeError: set expected at most 1 argument, got 3

Create a set containing numbers 0-9 using range.

In [17]:
set(range(0, 10))
Out[17]:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Or you use set comprehension.

In [19]:
s = set(i for i in range(0, 10))
s
Out[19]:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

set.add

set.add adds an Element in place.

In [1]:
s = set([1, 2, 3])
In [2]:
s.add(4)
In [3]:
s
Out[3]:
{1, 2, 3, 4}

set.union

s1.union(s2) returns a new set containing the union of s1 and s2 (an iterable) and s1 is unchanged. Note: there is no + operator (use set.union instead) for sets even if there is - operator for sets.

In [4]:
s = set([1, 2, 3])
In [5]:
s.union([2, 3, 4])
Out[5]:
{1, 2, 3, 4}
In [6]:
s
Out[6]:
{1, 2, 3}

You can union multiple iterables (of elements) at the same time.

In [4]:
s.union([2, 3, 4], [3, 4, 5])
Out[4]:
{1, 2, 3, 4, 5}
In [5]:
arrs = [[2, 3, 4], [3, 4, 5]]
arrs
Out[5]:
[[2, 3, 4], [3, 4, 5]]
In [7]:
s.union(*arrs)
Out[7]:
{1, 2, 3, 4, 5}

set.update

set.update inserts all elements of another iterable in place. You can think of set.update as the mutable version of set.union.

In [24]:
s = set([1, 2, 3])
In [25]:
s.update([2, 3, 4])
In [26]:
s
Out[26]:
{1, 2, 3, 4}

set.difference

s1.difference(s2) returns a new set containing the difference between s1 and s2 (an iterable) and s1 is unchanged. When s1 and s2 are both sets, s1.difference(s2) is equivalent to the - operator. To sum up, set.difference is more flexible than the operator - and is preferred. Note: there is no + (use set.union instead) operator for sets.

In [8]:
s = set([1, 2, 3])
s
Out[8]:
{1, 2, 3}
In [9]:
s.difference([2, 3, 4])
Out[9]:
{1}
In [10]:
s
Out[10]:
{1, 2, 3}
In [12]:
s - set([2, 3, 4])
Out[12]:
{1}

set.pop

In [3]:
s = set(range(10))
In [4]:
s
Out[4]:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
In [5]:
s.pop()
Out[5]:
0
In [6]:
s
Out[6]:
{1, 2, 3, 4, 5, 6, 7, 8, 9}

set.intersection

In [3]:
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [13, 17, 18, 21, 32]
c3 = [13, 59, 67]
In [2]:
set(c1).intersection(c2)
Out[2]:
{13, 32}
In [4]:
set(c1).intersection(c2, c3)
Out[4]:
{13}
In [5]:
set(c1).intersection(*[c2, c3])
Out[5]:
{13}
In [ ]:
 

Comments