Ben Chuanlong Du's Blog

It is never too late to learn.

Read Text File into a pandas DataFrame

Advanced Options

  1. The argument sep supports regular expression! For example,

     :::python
     df = pd.read_csv(file, sep+" +")

nrows: control the number of rows to read skiprows, skip blank lines (the default behavior)

namesarray-like, optional List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

usecols (list-like or callable, optional)

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

skiprows: list-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skip_blank_lines: bool, default True

If True, skip over blank lines rather than interpreting as NaN values.

encoding_errors

The encoding_errors option controls how encoding errors are treated. Please refer to Error Handlers. for a list of possible values for encoding_errors. If you have to live with a text file with unicode (mostly UTF-8) encoding issue, I'd suggest you use the option encoding_errors="replace" as it is the behavior of many IDEs.

In [ ]:
 

Comments