Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Installation¶
You have to install the complete version of Dask (using the command
pip3 install dask[complete]) if you need support of extended memory (for handling big data) and schedulers (for performance). The default installation version (pip3 install dask) of Dask does not include those features out-of-box.
import dask.dataframe as dddf.read_parquet("/path/to/file")df.shape[0].compute(scheduler="processes")