In [1]:

import pandas as pd

DataFrame from Dictionary¶

By default each key-value is a column in the resulting data frame. You can specify the option orient = 'index' to make each key-value a row in the resulting data frame when using the method pandas.DataFrame.from_dict.
Starting from Python 3.7, a dict preserves insertion orders. This effectively makes a pandas DataFrame keep the insertion order of columns.

In [2]:

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1], "z": [1, 1, 1, 1, 1]})

df.head()

Out[2]:

	x	y	z
0	1	5	1
1	2	4	1
2	3	3	1
3	4	2	1
4	5	1	1

In [3]:

df = pd.DataFrame.from_dict({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]})

df.head()

Out[3]:

	x	a
0	1	5
1	2	4
2	3	3
3	4	2
4	5	1

In [4]:

df = pd.DataFrame.from_dict(
    {"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]}, orient="index"
)

df.head()

Out[4]:

	0	1	2	3	4
x	1	2	3	4	5
a	5	4	3	2	1

In [8]:

df = pd.DataFrame.from_dict(
    {"how": 9, "are": 3, "you": 7, "doing": 5, "today": 6},
    orient="index",
    columns=["freq"],
)

df

Out[8]:

	freq
how	9
are	3
you	7
doing	5
today	6

DataFrame from List of Dictionaries (as Rows)¶

Each dictionary is a row in the resulting data frame.

In [2]:

d = [
    {"points": 50, "time": "5:00", "year": 2010},
    {"points": 25, "time": "6:00", "month": "february"},
    {"points": 90, "time": "9:00", "month": "january"},
    {"points_h1": 20, "month": "june"},
]
pd.DataFrame(d)

Out[2]:

	month	points	points_h1	time	year
0	NaN	50.0	NaN	5:00	2010.0
1	february	25.0	NaN	6:00	NaN
2	january	90.0	NaN	9:00	NaN
3	june	NaN	20.0	NaN	NaN

DataFrame from List of Lists/Tuples (as Rows)¶

Each list/tuple is a row in the resulting data frame.

In [8]:

df = pd.DataFrame(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
)

df.head()

Out[8]:

	a	b	c	d
0	foo	one	small	1
1	foo	one	large	2
2	foo	one	large	2
3	foo	two	small	3
4	foo	two	small	3

In [28]:

df = pd.DataFrame.from_records(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
)

df.head()

Out[28]:

	a	b	c	d
0	foo	one	small	1
1	foo	one	large	2
2	foo	one	large	2
3	foo	two	small	3
4	foo	two	small	3

DataFrame from List of Lists/Tuples (as Columns)¶

Each list/tuple is a row in the resulting data frame. Note that pd.concat on a list of Lists/Tuples won't here. You have to first create a DataFrame with the list of Lists/Tuples as rows and then transpose it.

In [15]:

df = pd.DataFrame(
    data=[
        ["foo", "one", "small", 1],
        ["foo", "one", "large", 2],
        ["foo", "one", "large", 2],
        ["foo", "two", "small", 3],
        ["foo", "two", "small", 3],
        ["bar", "one", "large", 4],
        ["bar", "one", "small", 5],
        ["bar", "two", "small", 6],
        ["bar", "two", "large", 7],
    ],
    columns=["a", "b", "c", "d"],
).transpose()

df.head()

Out[15]:

	0	1	2	3	4	5	6	7	8
a	foo	foo	foo	foo	foo	bar	bar	bar	bar
b	one	one	one	two	two	one	one	two	two
c	small	large	large	small	small	large	small	small	large
d	1	2	2	3	3	4	5	6	7

DataFrame from One Series (as a Row)¶

The sereis is a row in the resulting data frame.

In [11]:

id = pd.Series([1, 2, 3, 4, 5], name="id")

pd.DataFrame(data=[id])

Out[11]:

	0	1	2	3	4
id	1	2	3	4	5

In [12]:

id = pd.Series([1, 2, 3, 4, 5], name="id")

pd.DataFrame([id])

Out[12]:

	0	1	2	3	4
id	1	2	3	4	5

DataFrame from Multiple Serieses (as Rows)¶

The sereises are rows in the resulting data frame.

In [13]:

id = pd.Series([1, 2, 3, 4, 5], name="id")
x = pd.Series(["a", "b", "c", "d", "e"], name="x")
pd.DataFrame([id, x])

Out[13]:

	0	1	2	3	4
id	1	2	3	4	5
x	a	b	c	d	e

DataFrame from One Series (as a Column)¶

The sereis is a column in the resulting data frame.

In [6]:

id = pd.Series([1, 2, 3, 4, 5], name="id")
id.to_frame()

Out[6]:

	id
0	1
1	2
2	3
3	4
4	5

In [7]:

id = pd.Series([1, 2, 3, 4, 5], name="id")
pd.DataFrame(id)

Out[7]:

	id
0	1
1	2
2	3
3	4
4	5

DataFrame from Multiple Series (as Columns)¶

The serieses are columns in the resulting data frame.

In [10]:

id = pd.Series([1, 2, 3, 4, 5], name="id")
x = pd.Series(["a", "b", "c", "d", "e"], name="x")
pd.concat([id, x], axis=1)

Out[10]:

	id	x
0	1	a
1	2	b
2	3	c
3	4	d
4	5	e

Series to Underlying Data¶

In [19]:

id = pd.Series([1, 2, 3, 4, 5], name="id")
id.tolist()

Out[19]:

[1, 2, 3, 4, 5]

DataFrame to Underlying Data¶

In [21]:

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]})
print(df.head())
df.values.tolist()

Out[21]:

[[5, 1], [4, 2], [3, 3], [2, 4], [1, 5]]

Index¶

An index will always be created. By default, a sequence of integers (starting from 0) is used as the index.

In [3]:

import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3, 4, 5], "a": [5, 4, 3, 2, 1]}, index=None)

df

Out[3]:

	a	x
0	5	1
1	4	2
2	3	3
3	2	4
4	1	5

Column Names¶

Similar to the index, a sequence of integers (starting from 9) is used as the column names by default.

In [1]:

import pandas as pd

In [2]:

df = pd.DataFrame([(1, "a"), (2, "b")], columns=None)
df

Out[2]:

	0	1
0	1	a
1	2	b

Empty DataFrame¶

Create an empty DataFrame without any column or row.

In [2]:

pd.DataFrame({})

Out[2]:

Create an empty (no rows) DataFrame with 1 column named x.

In [4]:

df = pd.DataFrame({"x": []})
df

Out[4]:

	x

Create an empty (no rows) DataFrame with 2 column x and y.

In [6]:

df = pd.DataFrame([], columns=["x", "y"])
df

Out[6]:

	x	y

You can use the variable DataFrame.empty to check whether a DataFrame is empty or not.

In [3]:

df.empty

Out[3]:

True

You can operate on columns of an empty (no rows) DataFrame as usual.

In [66]:

df = pd.DataFrame({"cal_dt": []})
df.cal_dt = pd.to_datetime(df.cal_dt)
df

Out[66]:

	cal_dt

In [72]:

len(df.cal_dt.unique())

Out[72]:

In [67]:

d = df.cal_dt.max() - df.cal_dt.min()
d

Out[67]:

NaT

In [14]:

pd.isnull(d)

Out[14]:

True

In [15]:

pd.isnull(None)

Out[15]:

True

In [10]:

len(df.cal_dt.unique())

Out[10]:

In [ ]:

Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Construct pandas DataFrames in Python

DataFrame from Dictionary¶

DataFrame from List of Dictionaries (as Rows)¶

DataFrame from List of Lists/Tuples (as Rows)¶

DataFrame from List of Lists/Tuples (as Columns)¶

DataFrame from One Series (as a Row)¶

DataFrame from Multiple Serieses (as Rows)¶

DataFrame from One Series (as a Column)¶

DataFrame from Multiple Series (as Columns)¶

Series to Underlying Data¶

DataFrame to Underlying Data¶

Index¶

Column Names¶

Empty DataFrame¶

Comments