Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on the Python module dask

Installation

  1. You have to install the complete version of Dask (using the command pip3 install dask[complete]) if you need support of extended memory (for handling big data) and schedulers (for performance). The default installation version (pip3 install dask) of Dask does not include those features out-of-box.
In [ ]:
import dask.dataframe as dd
In [ ]:
df.read_parquet("/path/to/file")
In [ ]:
df.shape[0].compute(scheduler="processes")

Dask and sk-learn

Comments