Ben Chuanlong Du's Blog

It is never too late to learn.

Work With Multiple Spark Installations

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

spark-submit and spark-shell

Overwrite the PATH environment variable before invoking spark-submit and/or spark-shell often resolves the issue.

Spark in Jupyter/Lab Notebooks

Remove or reset the environment variable HADOOP_CONF_DIR resolves the issue.

import os
os.environ["HADOOP_CONF_DIR"] = ""
import findspark
findspark.init("/opt/spark-3.1.1-bin-hadoop3.2/")
from pyspark.sql import SparkSession, DataFrame
spark = SparkSession.builder.appName("PySpark_Notebook") \
    .enableHiveSupport().getOrCreate()
...
  • HADOOP_CONF_DIR
  • SPARK_HOME
  • HADOOP_HOME
  • HIVE_HOME
  • PIG_HOME
  • HBASE_HOME

Comments