Ben Chuanlong Du's Blog

It is never too late to learn.

Use Spark With Apache Toree Kernel in Juptyerlab

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The Docker image dclong/jupyterhub-toree has Spark and Apache Toree installed and configured. Since Spark is already installed in it, you don't need to download and install Spark by yourself. By default, a Spark Session object named spark is created automatically just like spark-shell. So, you can use Spark/Scala out-of-box in a JupyterLab notebook with the Scala - Apache Toree kernel.

  1. Open a JupyterLab notebook with the Scala - Apache Toree kernel from the launcher.

  2. Use Spark as usual.

    val df = Range(0, 10).toDF
