Ben Chuanlong Du's Blog

It is never too late to learn.

Rounding Functions in Spark

In [1]:
%%classpath add mvn
org.apache.spark spark-core_2.11 2.3.1
org.apache.spark spark-sql_2.11 2.3.1
In [2]:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession.builder()
    .master("local[2]")
    .appName("Spark Rounding Examples")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()

import spark.implicits._
Out[2]:
org.apache.spark.sql.SparkSession$implicits$@79d5f914
In [7]:
val df = Seq(1.12, 2.34, 9.87, 2.5, 3.5).toDF
df.show
+-----+
|value|
+-----+
| 1.12|
| 2.34|
| 9.87|
|  2.5|
|  3.5|
+-----+

Out[7]:
null

round

Round to the neart integer and round up at 0.5.

In [10]:
df.withColumn("round", round($"value")).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|  1.0|
| 2.34|  2.0|
| 9.87| 10.0|
|  2.5|  3.0|
|  3.5|  4.0|
+-----+-----+

The function round accepts an optional arguments (via method overloadingm) specifying the number of digits to keep.

In [11]:
df.withColumn("round", round($"value", 1)).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|  1.1|
| 2.34|  2.3|
| 9.87|  9.9|
|  2.5|  2.5|
|  3.5|  3.5|
+-----+-----+

bround

Round to the nearest integer and round to the even number at 0.5.

In [13]:
df.withColumn("round", bround($"value")).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|  1.0|
| 2.34|  2.0|
| 9.87| 10.0|
|  2.5|  2.0|
|  3.5|  4.0|
+-----+-----+

In [15]:
df.withColumn("round", bround($"value", 1)).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|  1.1|
| 2.34|  2.3|
| 9.87|  9.9|
|  2.5|  2.5|
|  3.5|  3.5|
+-----+-----+

rint

The function rint is similar to the function bround.

In [16]:
df.withColumn("round", rint($"value")).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|  1.0|
| 2.34|  2.0|
| 9.87| 10.0|
|  2.5|  2.0|
|  3.5|  4.0|
+-----+-----+

floor

In [17]:
df.withColumn("round", floor($"value")).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|    1|
| 2.34|    2|
| 9.87|    9|
|  2.5|    2|
|  3.5|    3|
+-----+-----+

ceil

In [18]:
df.withColumn("round", ceil($"value")).show
+-----+-----+
|value|round|
+-----+-----+
| 1.12|    2|
| 2.34|    3|
| 9.87|   10|
|  2.5|    3|
|  3.5|    4|
+-----+-----+

In [ ]:

Comments