Cut and qcut in pandas DataFrame

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {"x": [3, 3, 1, 10, 1, 10], "y": [1, 2, 3, 4, 5, 60], "z": [6, 5, 4, 3, 2, 1]}
)

df

pd.cut(df.y, 3)

0    (0.941, 20.667]
1    (0.941, 20.667]
2    (0.941, 20.667]
3    (0.941, 20.667]
4    (0.941, 20.667]
5     (40.333, 60.0]
Name: y, dtype: category
Categories (3, interval[float64]): [(0.941, 20.667] < (20.667, 40.333] < (40.333, 60.0]]

pd.cut(df.y, [0, 1.5, 4.5, 100])

0      (0.0, 1.5]
1      (1.5, 4.5]
2      (1.5, 4.5]
3      (1.5, 4.5]
4    (4.5, 100.0]
5    (4.5, 100.0]
Name: y, dtype: category
Categories (3, interval[float64]): [(0.0, 1.5] < (1.5, 4.5] < (4.5, 100.0]]

pd.cut(df.y, [1.5, 4.5, 10])

0            NaN
1     (1.5, 4.5]
2     (1.5, 4.5]
3     (1.5, 4.5]
4    (4.5, 10.0]
5            NaN
Name: y, dtype: category
Categories (2, interval[float64]): [(1.5, 4.5] < (4.5, 10.0]]

References¶

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html?highlight=qcut#pandas.qcut

https://pbpython.com/pandas-qcut-cut.html