Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

import pandas as pd
import numpy as np
import datetime

Summary

  1. There are many ways to filter rows of a pandas DataFrame. The slicing way is the universal way, however, the method DataFrame.query can be more convenient for simple filtering conditions and it is more efficient too.

  2. DataFrame.where is not intened for filtering but rather a analog to case statement in SQL and if/else or ternary expressions.

df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "word": ["Mary", "is", "a", "fox", "."],
        "date": [datetime.date.today() + datetime.timedelta(days=i) for i in range(5)],
    }
)
df
Loading...

DataFrame.query

df.query("id % 2 == 0")
Loading...
df.query("id in [3, 5, 7]")
Loading...
x = [3, 5, 7]
df.query("id in @x")
Loading...
df.query("word.str.len() == id")
Loading...
df.query("word.str.lower() == 'mary'")
Loading...
df.query("word.str.startswith('f')")
Loading...
df[df.word.str.startswith("f")]
Loading...

The below example is convenient way to compare a datetime column with a literal string.

df.query("date.astype('str') >= '2020-06-08'")
Loading...

Use Slicing

df[df.x % 2 == 0]
Loading...

DataFrame.where

The method DataFrame.where is not intended for filtring. It is an analog to case statement in SQL and if/else and ternary expression in programming languages. The condition that DataFrame.where takes can be a bool Series/DataFrame, array-like, or callable.

df.where(lambda r: r.id % 2 == 0)
Loading...
df.where(lambda r: r.id % 2 == 0, df * 3)
Loading...
df.where(lambda r: r.x % 2 == 0).dropna()
Loading...
df.id.where(lambda x: x % 2 == 0)
0 NaN 1 2.0 2 NaN 3 4.0 4 NaN Name: id, dtype: float64
df.id.where(lambda x: x % 2 == 0, 0)
0 0 1 2 2 0 3 4 4 0 Name: id, dtype: int64