Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Coalesce and Repartition in Spark DataFrame

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

%%classpath add mvn
org.apache.spark spark-core_2.11 2.3.1
org.apache.spark spark-sql_2.11 2.3.1
Loading...
Loading...
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession.builder()
    .master("local[2]")
    .appName("Spark Column Example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()

import spark.implicits._
org.apache.spark.sql.SparkSession$implicits$@60ad5f12
val df = spark.read.json("../../data/people.json")
df.show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

null

Get Number of Partitions

df.rdd.getNumPartitions
1

Repartition

val df2 = df.repartition(4)
[age: bigint, name: string]
df2.rdd.getNumPartitions
4