Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Issue¶
Total size of serialized results is bigger than spark.driver.maxResultSize
Solutions¶
Eliminate unnecessary
broadcastorcollect.If one of the tables for joining contains too large number of partitions (which results in too many jobs), repartition it to reduce the number of partitions before joining.
Make
spark.driver.maxResultSizelarger.set by SparkConf:
conf.set("spark.driver.maxResultSize", "3g")set by spark-defaults.conf:
spark.driver.maxResultSize 3gset when calling spark-submit:
--conf spark.driver.maxResultSize=3g
References¶
https://
https://
https://
https://
https://