Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Task Not Serializable

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Please refer to Spark Issue: _Pickle.Picklingerror: Args[0] from Newobj Args Has the Wrong Class for a similar serialization issue in PySpark.

Error Message

org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: ...

Possible Causes

Some object sent to works from the driver is not serializable.

Solutions

  1. Don't send the non-serializable object to workers.

  2. Use a serializable version if you do want to send the object to workders.

References

https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md

Comments