Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
References¶
https://
coalesce vs repartition¶
https://
%%classpath add mvn
org.apache.spark spark-core_2.11 2.3.1
org.apache.spark spark-sql_2.11 2.3.1Loading...
Loading...
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark = SparkSession.builder()
.master("local[2]")
.appName("Spark Column Example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
import spark.implicits._org.apache.spark.sql.SparkSession$implicits$@60ad5f12val df = spark.read.json("../../data/people.json")
df.show+----+-------+
| age| name|
+----+-------+
|null|Michael|
| 30| Andy|
| 19| Justin|
+----+-------+
nullGet Number of Partitions¶
df.rdd.getNumPartitions1Repartition¶
val df2 = df.repartition(4)[age: bigint, name: string]df2.rdd.getNumPartitions4