Ben Chuanlong Du's Blog

It is never too late to learn.

Read and Write Parquet Files in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are a few crates in Rust which can help read and write Parquet files, among which Polars is the best one. As a matter of fact, polars is a DataFrame implementation in Rust which is way beyond Parquet IO. The parquet crate might still be useful if you want to scan Parquet files row by row. Other Parquet related crates are low-level ones and are not average user oriented.

Polars

The polars crate is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model. It supports reading/writing Parquet files of course.

parquet

The parquet crate contains the official Native Rust implementation of Apache Parquet, which is part of the Apache Arrow project.

arrow

The arrow crate contains the official Native Rust implementation of Apache Arrow in memory format, governed by the Apache Software Foundation.

parquet2

The parquet2 crate is a re-write of the official parquet crate with performance, parallelism and safety in mind. The parquet2 decouples reading (IO intensive) from computing (CPU intensive) and delegates parallelism to downstream. It cannot be used directly to read parquet (except metadata). To read data from parquet, checkout arrow2.

arrow2

The arrow2 crate is an unofficial implementation of Apache Arrow spec in safe Rust. It is the most feature-complete implementation of the Arrow format after the C++ implementation.

References

Comments