Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
pandas.Series.str¶
The attribute
pandas.Series.strcan only be used with Series ofstrvalues. You will either encounter anAttributionError(Can only use .str accessor with string values, which use np.object_ dtype in pandas) or find it to yield a Series ofNaN’s if you invoke it on a Series of non-string values. If you have control of the DataFrame, the preferred way is to cast the type the column tostrin the DataFrame.df.status = df.status.astype(str)Generally speaking, it is a good idea to make sure that a column always have the same type in a pandas DataFrame. If you do not want to cast the column to
strin the DataFrame (for any reason), you can do this in computation without changing the type of the original column.df = df[df.status.astype(str).str.contains('Exit')]pandas.series.str.replacesupports regular expression.
import numpy as np
import pandas as pdx = pd.Series([1, 2, 3])
x0 1
1 2
2 3
dtype: int64Accessing .str with a Series of non-string values might throw AttributeError.
x.str---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-eb971929b925> in <module>
----> 1 x.str
~/.local/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5061 if (name in self._internal_names_set or name in self._metadata or
5062 name in self._accessors):
-> 5063 return object.__getattribute__(self, name)
5064 else:
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):
~/.local/lib/python3.7/site-packages/pandas/core/accessor.py in __get__(self, obj, cls)
169 # we're accessing the attribute of the class, i.e., Dataset.geo
170 return self._accessor
--> 171 accessor_obj = self._accessor(obj)
172 # Replace the property with the accessor object. Inspired by:
173 # http://www.pydanny.com/cached-property.html
~/.local/lib/python3.7/site-packages/pandas/core/strings.py in __init__(self, data)
1794
1795 def __init__(self, data):
-> 1796 self._validate(data)
1797 self._is_categorical = is_categorical_dtype(data)
1798
~/.local/lib/python3.7/site-packages/pandas/core/strings.py in _validate(data)
1816 # (instead of test for object dtype), but that isn't practical for
1817 # performance reasons until we have a str dtype (GH 9343)
-> 1818 raise AttributeError("Can only use .str accessor with string "
1819 "values, which use np.object_ dtype in "
1820 "pandas")
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandasTry to invoke methods in pandas.Series.str on a Series of pathlib.Path yields a series of NaN’s.
paths = pd.Series([Path("/root"), Path("abc.txt")])
paths0 /root
1 abc.txt
dtype: objectpaths.str.upper()0 NaN
1 NaN
dtype: float64A simple solution is to convert the type of the Series to str first and then call methods in pandas.Series.str.
paths.astype(str).str.upper()0 /ROOT
1 ABC.TXT
dtype: objects = pd.Series([np.nan, 1, 3, 10, 5])
s0 NaN
1 1.0
2 3.0
3 10.0
4 5.0
dtype: float64s.sort_values()1 1.0
2 3.0
4 5.0
3 10.0
0 NaN
dtype: float64