Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Hands on GroupBy of Polars DataFrame in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

import itertools as it
import polars as pl
df = pl.DataFrame(
    {
        "id": [0, 1, 2, 3, 4],
        "color": ["red", "green", "green", "red", "red"],
        "shape": ["square", "triangle", "square", "triangle", "square"],
    }
)
df
Loading...
df.groupby("color", maintain_order=True).agg(pl.col("id"))
Loading...
df.groupby("color", maintain_order=True).agg(pl.col("id").first())
Loading...
def update_frame(frame):
    frame[0, "id"] = frame[0, "id"] * 1000
    return frame
df.groupby("color").apply(update_frame)
Loading...

GroupBy + Aggregation

df.groupby("color").agg(pl.count().alias("n"))
Loading...
pl.DataFrame(data=it.combinations(range(52), 4), orient="row").with_row_count().groupby(
    [
        "column_0",
        "column_1",
        "column_2",
    ]
).agg(pl.col("row_nr").min()).sort(
    [
        "column_0",
        "column_1",
        "column_2",
    ]
)
Loading...

GroupBy as An Iterable

pl.Series((g, frame.shape[0]) for g, frame in df.groupby("color"))
Loading...