Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

No deduplication is done (to be efficient) when unioning RDDs/DataFrames in Spark 2.1.0+.

  1. Union 2 RDDs.

     df1.union(df2)
     // or for old-fashioned RDD
     rdd1.union(rdd_2)
  2. Union multiple RDDs.

     df = spark.union([df1, df2, df3]) // spark is a SparkSession object
     // or for old-fashioned RDD
     rdd = sc.union([rdd1, rdd2, rdd3]) // sc is a SparkContext object

References

Union DataFrames in Spark

Union DataFrames in Spark