Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Reference

https://pandas.pydata.org/pandas-docs/stable/merging.html

http://stackoverflow.com/questions/22676081/pandas-the-difference-between-join-and-merge

Comment

  1. You are able to specify (via left_on and right_on) which columns to join in each data frame.

  2. Columns that appear in both data frames but not used in joining are distinguished using suffixes.

import pandas as pd

df1 = pd.DataFrame({"x": [1, 2, 3], "y": [5, 4, 3]})
print(df1)

df2 = pd.DataFrame({"x": [10, 20, 30], "z": ["a", "b", "c"]})
print(df2)
   x  y
0  1  5
1  2  4
2  3  3
    x  z
0  10  a
1  20  b
2  30  c

Default Join

Columns (x in this case) appear in both data frames are used for joining.

df1.merge(df2)
Loading...

Join on Index

df1.merge(df2, left_index=True, right_index=True)
Loading...

Join on Specified Columns

import pandas as pd

df1 = pd.DataFrame({"id": [1, 2, 3], "v": [5, 4, 3]})
print(df1)

df2 = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
print(df2)
   id  v
0   1  5
1   2  4
2   3  3
   x  y
0  1  a
1  2  b
2  3  c
df1.merge(df2, left_on="id", right_on="x")
Loading...

Cartesion/Cross Join

import pandas as pd

df1 = pd.DataFrame({"id": [1, 2], "v": [5, 4]})
df1
Loading...
df2 = pd.DataFrame({"x": [10, 20], "y": ["a", "b"]})
df2
Loading...
df1.assign(key=1).merge(df2.assign(key=1))
Loading...
df1.assign(key=1).merge(df2.assign(key=1)).drop("key", axis=1)
Loading...