Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Tips on Installing Debian Series of Linux Distributions

Before Installation

Debian Specific

  1. You'd better not install back ported Debian images, as it might cause issues with other software (e.g., VirtualBox). It is suggested that you use Debian test.

Other Debian-based Linux Distributions

  1. Download the right ISO image of the Linux distribution that you want to install.

  2. Create …

Ways to Make a Bootable Flash Drive in Linux

Use Ventoy

Ventoy is the best graphical tool for making bootable flash drives currently.

Use the Command dd or cat

You can use

dd if=path_to_linux_image of=path_to_device bs=4M; sync

or

cat path_to_linux_image > path_to_device

to write a hybird Linux image into a flash drive. Note that you must …

Get CentOS Version

You can get the version of CentOS using the following command.

rpm -q centos-release

This trick can be used to get the version of the CentOS distribution on a Spark cluster. Basically, you run this command in the driver or workers to print the versions and then parse the log …

Control Number of Partitions of a DataFrame in Spark

Tips and Traps

  1. DataFrame.repartition repartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the method DataFrame.repartition, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed