Ben Chuanlong Du's Blog

It is never too late to learn.

Merge/Join pandas DataFrames

Reference

https://pandas.pydata.org/pandas-docs/stable/merging.html

http://stackoverflow.com/questions/22676081/pandas-the-difference-between-join-and-merge

Comment

  1. You are able to specify (via left_on and right_on) which columns to join in each data frame.

  2. Columns that appear in both data frames but not used in joining are distinguished using suffixes.

In [19]:
import pandas as pd

df1 = pd.DataFrame({"x": [1, 2, 3], "y": [5, 4, 3]})
print(df1)

df2 = pd.DataFrame({"x": [10, 20, 30], "z": ["a", "b", "c"]})
print(df2)
   x  y
0  1  5
1  2  4
2  3  3
    x  z
0  10  a
1  20  b
2  30  c

Default Join

Columns (x in this case) appear in both data frames are used for joining.

In [14]:
df1.merge(df2)
Out[14]:
x y z

Join on Index

In [16]:
df1.merge(df2, left_index=True, right_index=True)
Out[16]:
x_x y x_y z
0 1 a 10 a
1 2 b 20 b
2 3 c 30 c

Join on Specified Columns

In [21]:
import pandas as pd

df1 = pd.DataFrame({"id": [1, 2, 3], "v": [5, 4, 3]})
print(df1)

df2 = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
print(df2)
   id  v
0   1  5
1   2  4
2   3  3
   x  y
0  1  a
1  2  b
2  3  c
In [23]:
df1.merge(df2, left_on="id", right_on="x")
Out[23]:
id v x y
0 1 5 1 a
1 2 4 2 b
2 3 3 3 c

Cartesion/Cross Join

In [3]:
import pandas as pd

df1 = pd.DataFrame({"id": [1, 2], "v": [5, 4]})
df1
Out[3]:
id v
0 1 5
1 2 4
In [4]:
df2 = pd.DataFrame({"x": [10, 20], "y": ["a", "b"]})
df2
Out[4]:
x y
0 10 a
1 20 b
In [5]:
df1.assign(key=1).merge(df2.assign(key=1))
Out[5]:
id v key x y
0 1 5 1 10 a
1 1 5 1 20 b
2 2 4 1 10 a
3 2 4 1 20 b
In [7]:
df1.assign(key=1).merge(df2.assign(key=1)).drop("key", axis=1)
Out[7]:
id v x y
0 1 5 10 a
1 1 5 20 b
2 2 4 10 a
3 2 4 20 b
In [ ]:
 

Comments