Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Most of my Docker images have different variants (corresponding to tags latest, next, etc) for different use cases. And each tag might have histocial versions with the pattern mmddhh (mm, dd and hh stand for the month, day and hour) for fallback if a tag is broken. Please refer to the following tag table for more details.

TagBase Image OSComment
latestUbuntu LTS (or newer if necessary and well tested)The most recent stable version of the Docker image. The latest tag is what most users should use.
It cares more about user friendliness than Docker image size, load speed and even security.
nextUbuntu LTS (or newer if necessary and well tested)The most recent testing version of the Docker image.
New features/tools will be added into the next tag before entering other tags.
23.04Ubuntu 23.04For any of the following situations:
1. a specific Ubuntu/kernel version is required
2. trying out newer Ubuntu versions than LTS
mmddhhHistoical versions corresponding to the latest tag.Fallback tags (for latest) if the latest tag is broken.
next_mmddhhHistoical versions corresponding to the next tag.Fallback tags (for next) if the next tag is broken.
Docker ImageComment
dclong/vscode-serverCloud IDE code-server (based on VSCode)
dclong/jupyterhub-dsTraditional ML
dclong/jupyterhub-pytorchDeep Learning
dclong/python-portableBuild portable Python using python-build-standalone
dclong/jupyterhub-sagemathMath / Calculus
dclong/jupyterhub-ds:blog_071520For publishing legendu.net/blog using CICD.
dclong/gitpodEditing other GitHub repos using GitPod
dclong/jupyterhub-kotlin
dclong/jupyterhub-ganymede
JVM languages
dclong/rustpythonRustPython

Usage

Install Docker

Please refer to Install Docker for instructions on how to install and configure Docker.

Pull the Docker Image

Taking dclong/jupyterhub-ds as an example, you can pull it using the command below.

:::bash
docker pull dclong/jupyterhub-ds

For people in mainland of China, please refer to the post Speedup Docker Pulling and Pushing on ways to speed up pushing/pulling of Docker images. If you don’t bother, then just use the command below.

:::bash
docker pull registry.docker-cn.com/dclong/jupyterhub-ds

Start a Container using ldc

The recommended way to start containers for Docker images dclong/* is to use the ldc command which comes with icon .

Start a Container Manually

Below are explanation of some environment variable passed by the option -e to the Docker command. Keep the default if you don’t know what are the best to use. DOCKER_PASSWORD is probably the only one you want to and should change.

The root directory of JupyterLab/Jupyter notebooks is /workdir in the container. You can mount directory on the host to it as you wish. Below are illustration using the Docker image dclong/jupyterhub-ds.

The following command starts a container and mounts the current working directory and /home on the host machine to /workdir and /home_host in the container respectively.

:::bash
docker run -d --init \
    --platform linux/amd64 \
    --hostname jupyterhub-ds \
    --log-opt max-size=50m \
    -p 8000:8000 \
    -p 5006:5006 \
    -e DOCKER_USER=`id -un` \
    -e DOCKER_USER_ID=`id -u` \
    -e DOCKER_PASSWORD=`id -un` \
    -e DOCKER_GROUP_ID=`id -g` \
    -e DOCKER_ADMIN_USER=`id -un` \
    -v `pwd`:/workdir \
    -v `dirname $HOME`:/home_host \
    dclong/jupyterhub-ds /scripts/sys/init.sh

The following command (only works on Linux) does the same as the above one except that it limits the use of CPU and memory.

:::bash
docker run -d --init \
    --platform linux/amd64 \
    --name jupyterhub-ds \
    --log-opt max-size=50m \
    --memory=$(($(head -n 1 /proc/meminfo | awk '{print $2}') * 4 / 5))k \
    --cpus=$((`nproc` - 1)) \
    -p 8000:8000 \
    -p 5006:5006 \
    -e DOCKER_USER=`id -un` \
    -e DOCKER_USER_ID=`id -u` \
    -e DOCKER_PASSWORD=`id -un` \
    -e DOCKER_GROUP_ID=`id -g` \
    -e DOCKER_ADMIN_USER=`id -un` \
    -v `pwd`:/workdir \
    -v `dirname $HOME`:/home_host \
    dclong/jupyterhub-ds /scripts/sys/init.sh

Add a New User Inside a Docker Container

You can of course use the well know commands useradd, adduser, etc. to achive it. To make things easier for you, there are some shell scripts in the directory /scripts/sys/ to create usres for you.

You can use the option -h to print help doc for these commands.

:::bash
/scripts/sys/create_user_nogroup.sh -h
Create a new user with the group name "nogroup".
Syntax: create_user_nogroup user user_id [password]
Arguments:
user: user name
user_id: user id
password: Optional password of the user. If not provided, then the user name is used as the password.

Now suppose you want to create a new user dclong with user ID 2000 and group name nogroup, you can use the following command.

:::bash
sudo /scripts/sys/create_user_nogroup.sh dclong 2000

Since we didn’t specify a password for the user, the default password (same as the user name) is used.

Use the JupyterHub Server

  1. Open your browser and and visit your_host_ip:8000 where your_host_ip is the URL/ip address of your server.

  2. Login to the JupyterHub server using your user name (by default your user name on the host machine) and password (by default your user name on the host machine).

  3. It is strongly suggested (for security reasons) that you change your password (using the command passwd) in the container.

  4. Enjoy JupyterLab notebook!

Get Information of Running Jupyter/Lab Servers

If you are using the Jupyter/Lab server instead of JupyterHub, you will be asked for a token at login. If you have started the Docker container in interactive mode (option -i instead of -d), the token for login is printed to the console. Otherwise, the tokens (and more information about the servers) can be found by running the following command outside the Docker container.

:::bash
docker exec jupyterlab /scripts/list_jupyter.py

The above command tries to be smart in the sense that it first figures out the user that started the JupyterLab server and then query running Jupyter/Lab servers of that user. An equivalently but more specifically command (if the Docker is launched by the current user in the host) is as below

:::bash
docker exec -u $(id -un) jupyterlab /scripts/sys/list_jupyter.py

If you are inside the Docker container, then run the following command to get the tokens (and more information about the servers).

:::bash
/scripts/list_jupyter.py

Or equivalently if the Jupyter/Lab server is launched by the current user,

:::bash
/scripts/sys/list_jupyter.py

To sum up, most of time you can rely on /scripts/list_jupyter.py to find the tokens of the running Jupyter/Lab servers, no matter you are root or the user that launches the Docker/JupyterLab server, and no matter you are inside the Docker container or not. Yet another way to get information of the running JupyterLab server is to check the log. Please refer to the section Debug Docker Containers for more information.

Add a New User for JupyterHub

By default, any user in a Docker container of dclong/jupyterhub-* can visit the JupyterHub server. So if you want to grant access to a new user, just create an account for him in the Docker container. Please refer to Add a New User Inside a Docker Container on how to create a new user inside a Docker container.

Easy Install of Other Kernels

Install and configure PySpark for use with the Python kernel.

:::bash
icon spark -ic && icon pyspark -ic

Install the evcxr Rust kernel.

:::bash
icon evcxr -ic

Install the Almond Scala kernel.

:::bash
icon almond -ic

Install the ITypeScript kernel.

:::bash
icon its -ic

Many other software/tools can be easily install by icon .

Debug Docker Containers

You can change the option docker run -d ... to docker run -it ... to show logs of processes in the Docker container which helps debugging. If you have already started a Docker container using docker run -d ..., you can use the command docker logs to get the log of the container (which contains logs of all processes in it).

Use Spark in JupyterLab Notebooks

It is suggested that you use the Almond Scala kernel. I will gradually drop support of the BeakerX Scala kernel in my Docker images.

PySpark - pyspark and findspark

To use PySpark in a container of the Docker image dclong/jupyterhub-ds you need to install Spark and the Python package pyspark first, which can be achieved using the following command.

:::bash
icon spark -ic --loc /opt/  
icon pyspark -ic

Follow the steps below to use PySpark after it is installed.

  1. Open a JupyterLab notebook with the Python kernel from the launcher.

  2. Find and initialize PySpark.

     :::python
     import findspark
     # A symbolic link of the Spark Home is made to /opt/spark for convenience
     findspark.init("/opt/spark")
    
     from pyspark.sql import SparkSession
     spark = SparkSession.builder.appName('PySpark Example').enableHiveSupport().getOrCreate()
  3. Use Spark as usual.

     :::python
     df1 = spark.table("some_hive_table")
     df2 = spark.sql("select * from some_table")
     ...

Remote Connection to Desktop in the Container

If you are running a Docker container with a desktop environment (dclong/lubuntu* or dclong/xubuntu*), you can connect to the desktop environment in the Docker container using NoMachine.

  1. Download the NoMachine client from https://www.nomachine.com/download.

  2. Install the NoMachine client on your computer.

  3. Create a new connection from your computer to the desktop environment in the Docker image using the NX protocol and port 4000. You will be asked for a user name and password. By default, the user name used to start the Docker container on the host machine is used as both the user name and password in the Docker container.

List of Images and Detailed Information

Build my Docker Images

My Docker images are auto built leveraging GitHub Actions workflow in the GitHub repository docker_image_builder .

Tips for Maintaining Docker Images (for My own Reference)

  1. Do NOT update the latest tag until you have fully tested the corresponding next tag.

  2. Do NOT add new features or tools unless you really need them.

  3. It generally a good idea to restrict versions of non-stable packages to be a specific (working) version or a range of versions that is unlike to break.

  4. If you REALLY have to update some Bash script a Docker image, do not update it in the GitHub repository directly and build the Docker image to test whether it works. Instead, make a copy of the Bash script outside the Docker container, update it, and mount it into the container to test whether it work. If the updated Bash script work as you expected, then go ahead to update it in the GitHub repository.

On Failure of GitHub Actions Workflow for Building Docker Images

  1. If the Docker image buidling workflow fails due to network issues, it might not work to rerun failed pipelines in GitHub Actions (as the network issue is like due to probalematic network nodes and retrying failed pipelines sending jobs to the same nodes). In such situtions, it is better to trigger a new run of the workflow.

Known Issues

  1. NeoVim fails to work if a Docker image (with NeoVim installed) is run on Mac with the M1 chip even if you pass the option --platform linux/amd64 to docker run. A possible fix is to manually uninstall NeoVim using the following command

     :::bash
     wajig purge neovim

    and then install Vim instead.

     :::bash
     wajig install vim
  2. The command wajig fails to cache password if a Docker image (with wajig installed) is run on Mac with the M1 chip even if you pass the option --platform linux/amd64 to docker run. Fortunately, this is not a big issue.

  3. There is an issue with the dclong/xubuntu* Docker images due to Xfce on Ubuntu 18.04. It is suggested that you use the corresponding dclong/lubuntu* Docker images instead (which are based on LXQt) if a desktop environment is needed.