Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

import pandas as pd
import socket
import findspark

findspark.init("/opt/spark-3.2.0-bin-hadoop3.2")
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *

spark = SparkSession.builder.appName("PySpark").enableHiveSupport().getOrCreate()
df = spark.createDataFrame(
    pd.DataFrame(
        data=(
            ("Ben", "Du", 1),
            ("Ben", "Du", 2),
            ("Ben", "Tu", 3),
            ("Ben", "Tu", 4),
            ("Ken", "Xu", 1),
            ("Ken", "Xu", 9),
        ),
        columns=("fname", "lname", "score"),
    )
)
df.show()
+-----+-----+-----+
|fname|lname|score|
+-----+-----+-----+
|  Ben|   Du|    1|
|  Ben|   Du|    2|
|  Ben|   Tu|    3|
|  Ben|   Tu|    4|
|  Ken|   Xu|    1|
|  Ken|   Xu|    9|
+-----+-----+-----+

                                                                                

DataFrame.stat.approxQuantile

Notice that it returns a Double array.

df.stat.approxQuantile("score", [0.5], 0.1)
[2.0]
df.stat.approxQuantile("score", [0.5], 0.001)
[2.0]
df.stat.approxQuantile("score", [0.5], 0.5)
[1.0]