Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

import pandas as pd

DataFrame from Dictionary

  1. By default each key-value is a column in the resulting data frame. You can specify the option orient = 'index' to make each key-value a row in the resulting data frame when using the method pandas.DataFrame.from_dict.

  2. Starting from Python 3.7, a dict preserves insertion orders. This effectively makes a pandas DataFrame keep the insertion order of columns.

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1], "z": [1, 1, 1, 1, 1]})

df.head()
Loading...
df = pd.DataFrame.from_dict({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]})

df.head()
Loading...
df = pd.DataFrame.from_dict(
    {"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]}, orient="index"
)

df.head()
Loading...
df = pd.DataFrame.from_dict(
    {"how": 9, "are": 3, "you": 7, "doing": 5, "today": 6},
    orient="index",
    columns=["freq"],
)

df
Loading...

DataFrame from List of Dictionaries (as Rows)

Each dictionary is a row in the resulting data frame.

d = [
    {"points": 50, "time": "5:00", "year": 2010},
    {"points": 25, "time": "6:00", "month": "february"},
    {"points": 90, "time": "9:00", "month": "january"},
    {"points_h1": 20, "month": "june"},
]
pd.DataFrame(d)
Loading...

DataFrame from List of Lists/Tuples (as Rows)

Each list/tuple is a row in the resulting data frame.

df = pd.DataFrame(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
)

df.head()
Loading...
df = pd.DataFrame.from_records(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
)

df.head()
Loading...

DataFrame from List of Lists/Tuples (as Columns)

Each list/tuple is a row in the resulting data frame. Note that pd.concat on a list of Lists/Tuples won’t here. You have to first create a DataFrame with the list of Lists/Tuples as rows and then transpose it.

df = pd.DataFrame(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
).transpose()

df.head()
Loading...

DataFrame from One Series (as a Row)

The sereis is a row in the resulting data frame.

id = pd.Series([1, 2, 3, 4, 5], name="id")

pd.DataFrame(data=[id])
Loading...
id = pd.Series([1, 2, 3, 4, 5], name="id")

pd.DataFrame([id])
Loading...

DataFrame from Multiple Serieses (as Rows)

The sereises are rows in the resulting data frame.

id = pd.Series([1, 2, 3, 4, 5], name="id")
x = pd.Series(["a", "b", "c", "d", "e"], name="x")
pd.DataFrame([id, x])
Loading...

DataFrame from One Series (as a Column)

The sereis is a column in the resulting data frame.

id = pd.Series([1, 2, 3, 4, 5], name="id")
id.to_frame()
Loading...
id = pd.Series([1, 2, 3, 4, 5], name="id")
pd.DataFrame(id)
Loading...

DataFrame from Multiple Series (as Columns)

The serieses are columns in the resulting data frame.

id = pd.Series([1, 2, 3, 4, 5], name="id")
x = pd.Series(["a", "b", "c", "d", "e"], name="x")
pd.concat([id, x], axis=1)
Loading...

Series to Underlying Data

id = pd.Series([1, 2, 3, 4, 5], name="id")
id.tolist()
[1, 2, 3, 4, 5]

DataFrame to Underlying Data

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]})
print(df.head())
df.values.tolist()
   a  x
0  5  1
1  4  2
2  3  3
3  2  4
4  1  5
[[5, 1], [4, 2], [3, 3], [2, 4], [1, 5]]

Index

An index will always be created. By default, a sequence of integers (starting from 0) is used as the index.

import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]}, index=None)

df
Loading...

Column Names

Similar to the index, a sequence of integers (starting from 9) is used as the column names by default.

import pandas as pd
df = pd.DataFrame([(1, "a"), (2, "b")], columns=None)
df
Loading...

Empty DataFrame

Create an empty DataFrame without any column or row.

pd.DataFrame({})
Loading...

Create an empty (no rows) DataFrame with 1 column named x.

df = pd.DataFrame({"x": []})
df
Loading...

Create an empty (no rows) DataFrame with 2 column x and y.

df = pd.DataFrame([], columns=["x", "y"])
df
Loading...

You can use the variable DataFrame.empty to check whether a DataFrame is empty or not.

df.empty
True

You can operate on columns of an empty (no rows) DataFrame as usual.

df = pd.DataFrame({"cal_dt": []})
df.cal_dt = pd.to_datetime(df.cal_dt)
df
Loading...
len(df.cal_dt.unique())
0
d = df.cal_dt.max() - df.cal_dt.min()
d
NaT
pd.isnull(d)
True
pd.isnull(None)
True
len(df.cal_dt.unique())
0