-
PySpark 2.4 and older does not support Python 3.8. You have to use Python 3.7 with PySpark 2.4 or older.
-
It can be extremely helpful to run a PySpark application locally to detect possible issues before submitting it to the Spark cluster.
#!/usr/bin/env bash …
PySpark Issue: Java Gateway Process Exited Before Sending the Driver Its Port Number
I countered the issue when using PySpark locally
(the issue can happen to a cluster as well).
It turned out to be caused by a misconfiguration of the environment variable JAVA_HOME in Docker.
References
PySpark: Exception: Java gateway process exited before sending the driver its port number