Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Spark Issue: Namespace Quota Is Exceeded

Symptom

Caused by: org.apache.hadoop.hdfs.protocol.NSQuotaExceededException: The NameSpace quota (directories and files) of directory /user/user_name is exceeded: quota=163840 file count=163841

Cause

The namespace quota of the directory /user/user_name is execeeded.

Solutions

  1. Remove non-needed files from the directory /user/user_name to release some namespace …

Spark Issue: Rust Panic

If you use Rust with Spark/PySpark and there are issues in the Rust code, you might get Rust panic error messages.

Symptom

Error: b"thread 'main' panicked at 'index out of bounds: the len is 15 but the index is 15', src/game.rs:131:39\nnote: run with …

Spark Issue: RuntimeException: Unsupported Literal Type Class

Symptom

java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [1]

Possible Causes

This happens in PySpark when a Python list is provide where a scalar is required. Assuming id0 is an integer column in the DataFrame df, the following code throws the above error.

v = [1, 2, 3 …

Convert PDF to EPS

There are tons of tools for converting PDF pictures to EPS pictures in Linux. The pdf2ps command is a good one. It produces EPS pictures without losing much resolution. The general purpose tools convert (from the ImageMagick package) does not produce as good quality EPS figures.

Terminal Multiplexers

zellij

  1. There are 2 mature popular terminal multiplexer apps: screen and tmux. Both of them are very useful if you want to work on multiple tasks over 1 SSH connection. Screen is relative simple to use while tmux is much more powerful and more complicated to use.

  2. Besides enabling users to …

Schedule Cron Tasks in a Docker Container

Cron tasks work in a Docker container. However, you have to manually start the cron deamon (root or sudo required) using cron or sudo cron if it is not configured (via the Docker entrypoint) to start on the start of the Docker container. For tutorials on crontab, please refer to …