Ben Chuanlong Du's Blog

It is never too late to learn.

Data Frame Implementations in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Polars

Polars is a fast multi-threaded DataFrame library in Rust and Python.

datafusion

datafusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. DataFusion supports both an SQL and a DataFrame API for building logical query plans as well as a query optimizer and execution engine capable of parallel execution against partitioned data sources (CSV and Parquet) using threads. DataFusion also supports distributed query execution via the Ballista crate.

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as first-class citizens without paying a penalty for serialization costs.

Comments