Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Please refer to Spark Issue: Task Not Serializable for a similar serialization issue in Spark/Scala.
Symptom¶
Cause¶
For example, if you have the following import
from nltk.corpus import stopwordsthen calling the following in UDF or pandas UDFs might cause this issue.
stopwords.words("english")Solution¶
Simply move stopwords.words("english") out of UDFs and/or pandas UDFs to define a global variable.
References¶
关于python:Spark-Submit出现“ Pickling错误”“ _pickle.PicklingError:newobj args中的args [0]具有错误的类”
_pickle.PicklingError: args[0] from newobj args has the wrong class from cloudpickle.py