Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Read Tensorboard Logs

Using pandas.read_csv

This approaches requires you to have a running TensorBoard which is serving the data you want to read.

  1. Check the checkbox "Show data download links". See highlighted in the top-left corner of the screenshot below for an example.

  2. Select an experimentation whose you'd like to download. See highlighted in the bottom-right corner of the screenshot for an example.

Parse TOML Files in Python

  1. There are 2 popular Python libraries tomlkit and toml for parsing TOML formatted files in Python. tomlkit is preferred to toml as it is more flexible and style-preserving.

  2. A TOML file always interpret a key (even a bare ASCII integer) as string. For this reason, a dict with numerical keys …

Get CentOS Version

You can get the version of CentOS using the following command.

rpm -q centos-release

This trick can be used to get the version of the CentOS distribution on a Spark cluster. Basically, you run this command in the driver or workers to print the versions and then parse the log …

Control Number of Partitions of a DataFrame in Spark

Tips and Traps

  1. DataFrame.repartition repartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the method DataFrame.repartition, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed