Ben Chuanlong Du's Blog

It is never too late to learn.

Understand Index in pandas

Comments

  1. There are multiple ways to update the index of a DataFrame or Series. First, you can assign a new Series or Index object to the index of a DataFrame or Series. Or you can use methods such as DataFrame.set_index or DataFrame.reset_index. DataFrame.reset_index resets the index of a DataFrame/Series to an integer index starting from 0. The old index is kept by default but can be dropped using the option drop=True. DataFrame.set_index sets the index of a DataFrame to the specified column and removes the column from the DataFrame. This can also be achieved by directly assign the column to the index of the DataFrame and then manually remove the column from the DataFrame. Note that by default DataFrame.set_index, DataFrame.reset_index and Series.reset_index returns new copies. The option inplace=True can be specified to make the update in-place.
In [1]:
import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df
Out[1]:
x y
r1 1 5
r2 2 4
r3 3 3
r4 4 2
r5 5 1
In [2]:
df.set_index("x")
Out[2]:
y
x
1 5
2 4
3 3
4 2
5 1

reindex

DataFrame.reindex does NOT change the original index. It just rearrange rows according to the specified index. If you want change the index but keep the orignal order of row, just assign new values to the index of the DataFrame or call the method reset_index(drop=True).

In [1]:
import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()
Out[1]:
x y
r1 1 5
r2 2 4
r3 3 3
r4 4 2
r5 5 1
In [11]:
df.reindex(index=range(0, df.shape[0]))
Out[11]:
x y
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
In [12]:
df.reindex(index=["r1", "r3", "r5", "r2", "r4"])
Out[12]:
x y
r1 1 5
r3 3 3
r5 5 1
r2 2 4
r4 4 2
In [4]:
x = df.copy()
print(x)
x.index = range(1, 6)
x
   x  y
y      
5  1  5
4  2  4
3  3  3
2  4  2
1  5  1
Out[4]:
x y
1 1 5
2 2 4
3 3 3
4 4 2
5 5 1
In [22]:
x = df.copy()
x.reset_index()
Out[22]:
index x y
0 r1 1 5
1 r2 2 4
2 r3 3 3
3 r4 4 2
4 r5 5 1
In [3]:
x = df.copy()
x.reset_index(drop=True, inplace=True)
x
Out[3]:
x y
0 1 5
1 2 4
2 3 3
3 4 2
4 5 1

reset_index

By default reset_index returns a copy rather than modify the original data frame. You can specify inplace=True to overwrite the behavior.

Series

  1. If you drop the original index, you still have a Series. However, if you reset index of a sereis without dropping the original index, you get a data frame.
In [5]:
s = pd.Series([1, 2, 3, 4], index=["r1", "r2", "r3", "r4"])
s
Out[5]:
r1    1
r2    2
r3    3
r4    4
dtype: int64
In [8]:
df = s.reset_index()
df
Out[8]:
index 0
0 r1 1
1 r2 2
2 r3 3
3 r4 4
In [10]:
df = s.reset_index(drop=True)
df
Out[10]:
0    1
1    2
2    3
3    4
dtype: int64

DataFrame

In [15]:
import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()
Out[15]:
x y
r1 1 5
r2 2 4
r3 3 3
r4 4 2
r5 5 1
In [29]:
# keep the original index as a new column and create a new index
df.reset_index()
Out[29]:
index x y
0 r1 1 5
1 r2 2 4
2 r3 3 3
3 r4 4 2
4 r5 5 1
In [30]:
# drop the original index and create a new index
df.reset_index(drop=True)
Out[30]:
x y
0 1 5
1 2 4
2 3 3
3 4 2
4 5 1

Multi-index

In [31]:
import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]},
    index=pd.MultiIndex.from_tuples(
        [("r1", 0), ("r2", 1), ("r3", 2), ("r4", 3), ("r5", 4)]
    ),
)

df.head()
Out[31]:
x y
r1 0 1 5
r2 1 2 4
r3 2 3 3
r4 3 4 2
r5 4 5 1
In [32]:
df.reset_index()
Out[32]:
level_0 level_1 x y
0 r1 0 1 5
1 r2 1 2 4
2 r3 2 3 3
3 r4 3 4 2
4 r5 4 5 1
In [33]:
df.reset_index(drop=True)
Out[33]:
x y
0 1 5
1 2 4
2 3 3
3 4 2
4 5 1
In [38]:
# drops the 2nd index and keep the first index
df.reset_index(level=1, drop=True)
Out[38]:
x y
r1 1 5
r2 2 4
r3 3 3
r4 4 2
r5 5 1

Assign Index

In [1]:
import pandas as pd

df = pd.DataFrame(
    {"x": [1, 2, 3, 4, 5], "y": [5, 4, 3, 2, 1]}, index=["r1", "r2", "r3", "r4", "r5"]
)

df.head()
Out[1]:
x y
r1 1 5
r2 2 4
r3 3 3
r4 4 2
r5 5 1
In [2]:
df.index = df.y
df
Out[2]:
x y
y
5 1 5
4 2 4
3 3 3
2 4 2
1 5 1

Index to Series

An index can be converted to a Series object, which makes it benefits from the rich methods of Series.

In [49]:
df.columns.to_series().select(lambda x: x == "x")
Out[49]:
x    x
dtype: object

Multi-Index

In [ ]:
pd.MultiIndex.from_product([[jj.index.name], jj.index.values])
In [ ]:
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
In [ ]:
pd.MultiIndex.from_tuples([(jj.index.name, v) for v in jj.index.values])
In [ ]:
 

Comments