Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Hands on dict in Python

Tips and Traps

  1. Starting from Python 3.7, dict preserves insertion order (i.e., dict is ordered). There is no need to use OrderedDict any more in Python 3.7+. However, set in Python is implemented as an unordered hashset and thus is neither ordered nor sorted. A trick to dedup an iterable values while preserving the order of first occurences is to leverage dict instead set.

     {v: None for v in values}.keys()
    
    

    This is also a good trick to yield reproducible dedupped results without using sort.

  2. With the method dict.setdefault, you do not really need defaultdict. As a matter of fact, it is recommended that you use dict over defaultdict for safty reasons.

In [1]:
s = set([11111, 10, 8, 7, 1, 2, 3])
In [2]:
list(s)
Out[2]:
[1, 2, 3, 7, 8, 11111, 10]
In [18]:
list(s)
Out[18]:
[1, 2, 3, 7, 8, 11111, 10]
In [2]:
import numpy as np

Construct dict

Dictionary Comprehension

In [1]:
d = {i: i * i for i in range(3)}
d
Out[1]:
{0: 0, 1: 1, 2: 4}

Tuple to Dict

You can pass an iterable of pairs (a pair is tuple with 2 elements) to dict to create a dict object. The first elements of pairs are the keys and the second elements of pairs are the corresponding values.

In [3]:
d = dict((row[0], (row[1], row[2])) for row in [("Ben", 1, 2), ("Lisa", 2, 3)])
In [4]:
d
Out[4]:
{'Ben': (1, 2), 'Lisa': (2, 3)}
In [6]:
[k for k in d]
Out[6]:
['Ben', 'Lisa']
In [7]:
d.items()
Out[7]:
dict_items([('Ben', (1, 2)), ('Lisa', (2, 3))])

Passing an iterable of tuples with lengths differ from 2 causes an ValueError.

In [2]:
dict([("abc", 1, 2)])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-19adc35cd255> in <module>()
      1 dict([
----> 2     ('abc', 1, 2)
      3 ])

ValueError: dictionary update sequence element #0 has length 3; 2 is required

Passing an empty iterable to dict generates an empty dict object.

In [1]:
dict([])
Out[1]:
{}

pandas.Index is a dict-like Object

In [8]:
import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1], "z": [1, 1, 1, 1, 1]})

df.head()
In [12]:
df.index.intersection(d.keys())
Out[12]:
Int64Index([0, 1, 2], dtype='int64')

KeyError exception is raise is the key is not found. DefaultDict does not raise an exception when a key is not found but instead returns the default value.

In [3]:
d[3]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-d787ddb7dc0e> in <module>()
----> 1 d[3]

KeyError: 3

get is the safe version. It's equivalent to the following code.

d[3] if 3 in d else None
In [6]:
d.get(3)

Merge Two Dictionaries

In [1]:
x = {"a": 1, "b": 2}
In [2]:
y = {"b": 3, "c": 4}
In [3]:
{**x, **y}
Out[3]:
{'a': 1, 'b': 3, 'c': 4}

keys

In [9]:
d = {"a": 1, "b": 2}
d.keys()
Out[9]:
dict_keys(['a', 'b'])

in

In [1]:
d = {"a": 1, "b": 2}
"a" in d
Out[1]:
True

values

In [10]:
d = dict((row[0], (row[1], row[2])) for row in [("Ben", 1, 2), ("Lisa", 2, 3)])
In [11]:
d.values()
Out[11]:
dict_values([(1, 2), (2, 3)])
In [12]:
list(v[0] for v in d.values())
Out[12]:
[1, 2]
In [13]:
max(v[0] for v in d.values())
Out[13]:
2

Iterate Dictionary

Iterate Keys

In [10]:
d = {"a": 1, "b": 2}
for k in d:
    print(str(k) + ": " + str(d[k]))
a: 1
b: 2

Iterate Key/Value Pairs

In [12]:
for k, v in d.items():
    print(str(k) + ": " + str(v))
a: 1
b: 2

Cannot interate (key, value) pairs.

In [13]:
for k, v in d:
    print(str(k))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-21b7a9439b1d> in <module>
----> 1 for k, v in d:
      2     print(str(k))

ValueError: not enough values to unpack (expected 2, got 1)

Iterate Values Directly

In [14]:
for v in d.values():
    print(v)
1
2

setdefault - Set Default Value for a Key

With the method dict.setdefault, you do not really need defaultdict. As a matter of fact, it is recommended that you use dict over defaultdict for safty reasons.

In [5]:
dic = {"x": 10, "y": 20}
dic
Out[5]:
{'x': 10, 'y': 20}
In [6]:
dic.setdefault("x", 0)
dic["x"] += 1
dic
Out[6]:
{'x': 11, 'y': 20}
In [7]:
dic.setdefault("z", 0)
dic["z"] += 1
dic
Out[7]:
{'x': 11, 'y': 20, 'z': 1}
In [8]:
dic.setdefault("list", [])
dic["list"].append("how")
dic
Out[8]:
{'x': 11, 'y': 20, 'z': 1, 'list': ['how']}

Remove Elements

In [2]:
d = {"a": 1, "b": 2}
d
Out[2]:
{'a': 1, 'b': 2}
In [3]:
del d["a"]
d
Out[3]:
{'b': 2}
In [4]:
del d["non_exist_key"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-8875b36fb7df> in <module>
----> 1 del d['non_exist_key']

KeyError: 'non_exist_key'
In [5]:
d.pop("non_exist_key", None)
d
Out[5]:
{'b': 2}
In [5]:
words = [
    "how",
    "are",
    "how",
    "are",
    "how",
    "you",
    "are",
    "how",
    "you",
    "are",
    "you",
    "how",
]

You can use set or numpy.unique to dedup a list but it does not preserver the order of first occurence of elements.

In [27]:
list(set(words))
Out[27]:
['how', 'you', 'are']
In [29]:
np.unique(words)
Out[29]:
array(['are', 'how', 'you'], dtype='<U3')

One possible way to dedup and preserve the original order of first occurences of elements is to dedup using a dictionary (which preserves insertion order).

In [17]:
" ".join({word: None for word in words})
Out[17]:
'how are you'

Sort a Dict

Sort a dict object by its keys.

In [1]:
dic = {"how": 2, "are": 4, "you": 3, "doing": 1, "today": 0}
sorted(dic)
Out[1]:
['are', 'doing', 'how', 'today', 'you']
In [2]:
sorted(dic.items())
Out[2]:
[('are', 4), ('doing', 1), ('how', 2), ('today', 0), ('you', 3)]

Sort a dict object by its values.

In [3]:
x = {"how": 2, "are": 4, "you": 3, "doing": 1, "today": 0}
dict(sorted(x.items(), key=lambda item: item[1]))
Out[3]:
{'today': 0, 'doing': 1, 'how': 2, 'you': 3, 'are': 4}

Ref vs Copy

In [18]:
d = {"Ben": [1, 2], "Lisa": [2, 3]}
In [20]:
ben = d["Ben"]
ben
Out[20]:
[1, 2]
In [21]:
ben[0] = 10000
In [22]:
ben
Out[22]:
[10000, 2]
In [23]:
d
Out[23]:
{'Ben': [10000, 2], 'Lisa': [2, 3]}
In [25]:
d = {"Ben": 1, "Lisa": 2}
In [26]:
ben = d["Ben"]
In [27]:
ben
Out[27]:
1
In [28]:
ben = 10000
In [29]:
ben
Out[29]:
10000
In [30]:
d
Out[30]:
{'Ben': 1, 'Lisa': 2}
In [31]:
x = 1
In [32]:
x += 10
In [33]:
x
Out[33]:
11

Comments