Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Parse TOML Files in Python

  1. There are 2 popular Python libraries tomlkit and toml for parsing TOML formatted files in Python. tomlkit is preferred to toml as it is more flexible and style-preserving.

  2. A TOML file always interpret a key (even a bare ASCII integer) as string. For this reason, a dict with numerical keys …

Process Big Data Using PySpark

  1. PySpark 2.4 and older does not support Python 3.8. You have to use Python 3.7 with PySpark 2.4 or older.

  2. It can be extremely helpful to run a PySpark application locally to detect possible issues before submitting it to the Spark cluster.

    #!/usr/bin/env bash …

Get CentOS Version

You can get the version of CentOS using the following command.

rpm -q centos-release

This trick can be used to get the version of the CentOS distribution on a Spark cluster. Basically, you run this command in the driver or workers to print the versions and then parse the log …