Comments¶
It is suggested that you always pass a list of columns to the parameter
on
even if there's only one column for joining.None
in a pandas DataFrame is converted toNaN
instead ofnull
!Spark allows using following join types:
inner
(default)cross
outer
full
,fullouter
,full_outer
left
,leftouter
,left_outer
right
,rightouter
,right_outer
semi
,leftsemi
,left_semi
anti
,leftanti
,left_anti
Inner Join of Spark DataFrames
Tips and Traps¶
Select only needed columns before joining.
Rename joining column names to be identical (if different) before joining.