Comments¶
It is suggested that you always pass a list of columns to the parameter
oneven if there's only one column for joining.Nonein a pandas DataFrame is converted toNaNinstead ofnull!Spark allows using following join types:
inner(default)crossouterfull,fullouter,full_outerleft,leftouter,left_outerright,rightouter,right_outersemi,leftsemi,left_semianti,leftanti,left_anti
Inner Join of Spark DataFrames
Tips and Traps¶
Select only needed columns before joining.
Rename joining column names to be identical (if different) before joining.