Ben Chuanlong Du's Blog

And let it direct your passion with reason.

My Docker Images

Most of my Docker images have different variants (corresponding to tags latest, next, alpine, etc) for different use cases. And each tag might have histocial versions with the pattern mmddhh (mm, dd and hh stand for the month, day and hour) for fallback if a tag is broken. Please refer to the following tag table for more details.

Tag Base Image OS Comment
latest Ubuntu LTS (or newer if necessary and well tested) The most recent stable version of the Docker image. The latest tag is what most users should use.
It cares more about user friendliness than Docker image size, load speed and even security.
next Ubuntu LTS (or newer if necessary and well tested) The most recent testing version of the Docker image.
New features/tools will be added into the next tag before entering other tags.
22.10 Ubuntu 22.10 For any of the following situations:
1. a specific Ubuntu/kernel version is required
2. trying out newer Ubuntu versions than LTS
debian Debian testing. For some rare situations where a required tool works in Debian but not in Ubuntu.
alpine Alpine stable. For production deployment where Docker image size, load speed and security are of concerns.
mmddhh Histoical versions corresponding to the latest tag. Fallback tags (for latest) if the latest tag is broken.
next_mmddhh Histoical versions corresponding to the next tag. Fallback tags (for next) if the next tag is broken.
debian_mmddhh Histoical versions corresponding to the debian tag. Fallback tags (for debian) if the debian tag is broken.
alpine_mmddhh Histoical versions corresponding to the alpine tag. Fallback tags (for alpine) if the alpine tag is broken.
Docker Image Comment
dclong/vscode-server Cloud IDE code-server (based on VSCode)
dclong/jupyterhub-ds Traditional ML
dclong/jupyterhub-pytorch Deep Learning
dclong/python-portable Build portable Python using python-build-standalone
dclong/conda-build Build portable Anaconda Python environment
dclong/jupyterhub-sagemath Math / Calculus
dclong/gitpod:blog Editing legendu.net/blog using GitPod
dclong/gitpod Editing other GitHub repos using GitPod
dclong/jupyterhub-kotlin
dclong/jupyterhub-ganymede
JVM languages
dclong/rustpython RustPython

Usage

Install Docker

Please refer to Install Docker for instructions on how to install and configure Docker.

Pull the Docker Image

Taking dclong/jupyterhub-ds as an example, you can pull it using the command below.

docker pull dclong/jupyterhub-ds

For people in mainland of China, please refer to the post Speedup Docker Pulling and Pushing on ways to speed up pushing/pulling of Docker images. If you don't bother, then just use the command below.

docker pull registry.docker-cn.com/dclong/jupyterhub-ds

Start a Container

Below are explanation of some environment variable passed by the option -e to the Docker command. Keep the default if you don't know what are the best to use. DOCKER_PASSWORD is probably the only one you want to and should change.

  • DOCKER_USER: The user to be created (dynamically) in the Docker container. The shell command id -un gets the name of the current user (on the host), so the option -e DOCKER_USER=$(id -un) instructs the script /scripts/sys/init.sh to create a user in the Docker container whose name is the same as the current user on the host. WARNING: the shell script /scripts/sys/init.sh cannot create a user named root as it already exists in the Docker container. If you start a Docker container using root, make sure to pass a different user name to the envrionment variable DOCKER_USER, e.g., -e DOCKER_USER=dclong. For more discussion, please refer to this issue.
  • DOCKER_USER_ID: The ID of the user to be created in the Docker container. The shell command id -u gets the user ID of the current user (on the host), so the option -e DOCKER_USER_ID=$(id -u) instructs the script /scripts/sys/init.sh to create a user in the Docker container whose user ID is the same as the user ID of the current user on the host. This means that the user in the Docker container is essentailly the current user on the host, which helps resolve file permissions between the Docker container and the host. This option is similar to the option --user of the command docker run, and you want to keep it unchanged, generally speaking.
  • DOCKER_PASSWORD: The password of the user to be created in the Docker container. The shell command id -un get the name of the current user (on the host), so the option -e DOCKER_PASSWORD=$(id -un) instructs the script /scripts/sys/init.sh to create a user in the Docker container whose password is the name of the current user on the host. WARNING: You'd better change the default value for security reasons. Of course, users can always change it later using the command passwd inside the Docker container.
  • DOCKER_GROUP_ID: The group ID of the user to be created in the Docker container. The shell command id -g gets the group ID of the current user (on the host), so the option -e DOCKER_GROUP_ID=$(id -g) instructs the script /scripts/sys/init.sh to create a user in the Docker container whose group ID is the same as the group ID of the current user on the host. You want to keep this option unchanged, generally speaking.
  • DOCKER_ADMIN_USER: This environment variable applies to Docker images dclong/jupyterhub* only. It specifies the admin user of the JupyterHub server. It should be the same as DOCKER_USER generally speaking.
  • USER_MEM_LIMIT: This environment variable applies to Docker images dclong/jupyterhub* only. It limits the memory that each user can use. Note: this optional is not in effect currently.

The root directory of JupyterLab/Jupyter notebooks is /workdir in the container. You can mount directory on the host to it as you wish. Below are illustration using the Docker image dclong/jupyterhub-ds.

The following command starts a container and mounts the current working directory and /home on the host machine to /workdir and /home_host in the container respectively.

docker run -d --init \
    --platform linux/amd64 \
    --hostname jupyterhub-ds \
    --log-opt max-size=50m \
    -p 8000:8000 \
    -p 5006:5006 \
    -e DOCKER_USER=`id -un` \
    -e DOCKER_USER_ID=`id -u` \
    -e DOCKER_PASSWORD=`id -un` \
    -e DOCKER_GROUP_ID=`id -g` \
    -e DOCKER_ADMIN_USER=`id -un` \
    -v `pwd`:/workdir \
    -v `dirname $HOME`:/home_host \
    dclong/jupyterhub-ds /scripts/sys/init.sh

The following command (only works on Linux) does the same as the above one except that it limits the use of CPU and memory.

docker run -d --init \
    --platform linux/amd64 \
    --name jupyterhub-ds \
    --log-opt max-size=50m \
    --memory=$(($(head -n 1 /proc/meminfo | awk '{print $2}') * 4 / 5))k \
    --cpus=$((`nproc` - 1)) \
    -p 8000:8000 \
    -p 5006:5006 \
    -e DOCKER_USER=`id -un` \
    -e DOCKER_USER_ID=`id -u` \
    -e DOCKER_PASSWORD=`id -un` \
    -e DOCKER_GROUP_ID=`id -g` \
    -e DOCKER_ADMIN_USER=`id -un` \
    -v `pwd`:/workdir \
    -v `dirname $HOME`:/home_host \
    dclong/jupyterhub-ds /scripts/sys/init.sh

Use the JupyterHub Server

  1. Open your browser and and visit your_host_ip:8000 where your_host_ip is the URL/ip address of your server.

  2. Login to the JupyterHub server using your user name (by default your user name on the host machine) and password (by default your user name on the host machine).

  3. It is strongly suggested (for security reasons) that you change your password (using the command passwd) in the container.

  4. Enjoy JupyterLab notebook!

Get Information of Running Jupyter/Lab Servers

If you are using the Jupyter/Lab server instead of JupyterHub, you will be asked for a token at login. If you have started the Docker container in interactive mode (option -i instead of -d), the token for login is printed to the console. Otherwise, the tokens (and more information about the servers) can be found by running the following command outside the Docker container.

docker exec jupyterlab /scripts/list_jupyter.py

The above command tries to be smart in the sense that it first figures out the user that started the JupyterLab server and then query running Jupyter/Lab servers of that user. An equivalently but more specifically command (if the Docker is launched by the current user in the host) is as below

docker exec -u $(id -un) jupyterlab /scripts/sys/list_jupyter.py

If you are inside the Docker container, then run the following command to get the tokens (and more information about the servers).

/scripts/list_jupyter.py

Or equivalently if the Jupyter/Lab server is launched by the current user,

/scripts/sys/list_jupyter.py

To sum up, most of time you can rely on /scripts/list_jupyter.py to find the tokens of the running Jupyter/Lab servers, no matter you are root or the user that launches the Docker/JupyterLab server, and no matter you are inside the Docker container or not. Yet another way to get information of the running JupyterLab server is to check the log. Please refer to the section Debug Docker Containers for more information.

Add a New User to the JupyterHub Server

By default, any user in the Docker container can visit the JupyterHub server. So if you want to grant access to a new user, just create an account for him in the Docker container. You can of course use the well know commands useradd, adduser, etc. to achive it. To make things easier for you, there are some shell scripts in the directory /scripts/sys/ to create usres for you.

  • /scripts/sys/create_user.sh: Create a new user. It's the base script for creating users.
  • /scripts/sys/create_user_group.sh: Create a new user with the given (existing) group.
  • /scripts/sys/create_user_nogroup.sh: Create a new user with group name nogroup.
  • /scripts/sys/create_user_docker.sh: Create a new user with group name docker.

You can use the option -h to print help doc for these commands.

/scripts/sys/create_user_nogroup.sh -h
Create a new user with the group name "nogroup".
Syntax: create_user_nogroup user user_id [password]
Arguments:
user: user name
user_id: user id
password: Optional password of the user. If not provided, then the user name is used as the password.

Now suppose you want to create a new user dclong with user ID 2000 and group name nogroup, you can use the following command.

sudo /scripts/sys/create_user_nogroup.sh dclong 2000

Since we didn't specify a password for the user, the default password (same as the user name) is used.

Easy Install of Other Kernels

Install and configure PySpark for use with the Python kernel.

icon spark -ic && icon pyspark -ic

Install the evcxr Rust kernel.

icon evcxr -ic

Install the Almond Scala kernel.

icon almond -ic

Install the ITypeScript kernel.

icon its -ic

Many other software/tools can be easily install by icon .

Debug Docker Containers

You can change the option docker run -d ... to docker run -it ... to show logs of processes in the Docker container which helps debugging. If you have already started a Docker container using docker run -d ..., you can use the command docker logs to get the log of the container (which contains logs of all processes in it).

Use Spark in JupyterLab Notebooks

It is suggested that you use the Almond Scala kernel. I will gradually drop support of the BeakerX Scala kernel in my Docker images.

PySpark - pyspark and findspark

To use PySpark in a container of the Docker image dclong/jupyterhub-ds you need to install Spark and the Python package pyspark first, which can be achieved using the following command.

icon spark -ic --loc /opt/  
icon pyspark -ic

Follow the steps below to use PySpark after it is installed.

  1. Open a JupyterLab notebook with the Python kernel from the launcher.

  2. Find and initialize PySpark.

    import findspark
    # A symbolic link of the Spark Home is made to /opt/spark for convenience
    findspark.init("/opt/spark")
    
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName('PySpark Example').enableHiveSupport().getOrCreate()
    
  3. Use Spark as usual.

    df1 = spark.table("some_hive_table")
    df2 = spark.sql("select * from some_table")
    ...
    

Remote Connection to Desktop in the Container

If you are running a Docker container with a desktop environment (dclong/lubuntu* or dclong/xubuntu*), you can connect to the desktop environment in the Docker container using NoMachine.

  1. Download the NoMachine client from https://www.nomachine.com/download.
  2. Install the NoMachine client on your computer.
  3. Create a new connection from your computer to the desktop environment in the Docker image using the NX protocol and port 4000. You will be asked for a user name and password. By default, the user name used to start the Docker container on the host machine is used as both the user name and password in the Docker container.

List of Images and Detailed Information

Build my Docker Images

My Docker images are auto built leveraging GitHub Actions workflow in the GitHub repository docker_image_builder .

Tips for Maintaining Docker Images (for My own Reference)

  1. Do NOT update the latest tag until you have fully tested the corresponding next tag.

  2. Do NOT add new features or tools unless you really need them.

  3. It generally a good idea to restrict versions of non-stable packages to be a specific (working) version or a range of versions that is unlike to break.

  4. If you REALLY have to update some Bash script a Docker image, do not update it in the GitHub repository directly and build the Docker image to test whether it works. Instead, make a copy of the Bash script outside the Docker container, update it, and mount it into the container to test whether it work. If the updated Bash script work as you expected, then go ahead to update it in the GitHub repository.

Known Issues

  1. NeoVim fails to work if a Docker image (with NeoVim installed) is run on Mac with the M1 chip even if you pass the option --platform linux/amd64 to docker run. A possible fix is to manually uninstall NeoVim using the following command

    wajig purge neovim
    

    and then install Vim instead.

    wajig install vim
    
  2. The command wajig fails to cache password if a Docker image (with wajig installed) is run on Mac with the M1 chip even if you pass the option --platform linux/amd64 to docker run. Fortunately, this is not a big issue.

  3. There is an issue with the dclong/xubuntu* Docker images due to Xfce on Ubuntu 18.04. It is suggested that you use the corresponding dclong/lubuntu* Docker images instead (which are based on LXQt) if a desktop environment is needed.

Comments