Spark Issue: RuntimeException: Unsupported Literal Type Class


java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList [1]

Possible Causes

This happens in PySpark when a Python list is provide where a scalar is required. Assuming id0 is an integer column in the DataFrame df, the following code throws the above error.

v = [1, 2, 3 …

Types of Joins of Spark DataFrames


  1. It is suggested that you always pass a list of columns to the parameter on even if there's only one column for joining.

  2. None in a pandas DataFrame is converted to NaN instead of null!

  3. Spark allows using following join types:

    • inner (default)
    • cross
    • outer
    • full, fullouter, full_outer
    • left, leftouter, left_outer
    • right, rightouter, right_outer
    • semi, leftsemi, left_semi
    • anti, leftanti, left_anti