Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Comments

  1. After sorting, rows in a DataFrame are sorted according to partition ID. And within each partition, rows are sorted. This property can be leverated to implement global ranking of rows. For more details, please refer to Computing global rank of a row in a DataFrame with Spark SQL. However, notice that multi-layer ranking is often more efficiency than a global ranking in big data applications.

import findspark

findspark.init("/opt/spark")

from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *
from pyspark.sql.types import StructType

spark = (
    SparkSession.builder.appName("PySpark_Sorting").enableHiveSupport().getOrCreate()
)
import pandas as pd
df_p = pd.DataFrame(
    [
        ("Ben", "Du", 1),
        ("Ben", "Du", 2),
        ("Ken", "Xu", 1),
        ("Ken", "Xu", 9),
        ("Ben", "Tu", 3),
        ("Ben", "Tu", 4),
    ],
    columns=["first_name", "last_name", "id"],
)
df_p
Loading...
df = spark.createDataFrame(df_p)
df.show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ben|       Du|  1|
|       Ben|       Du|  2|
|       Ken|       Xu|  1|
|       Ken|       Xu|  9|
|       Ben|       Tu|  3|
|       Ben|       Tu|  4|
+----------+---------+---+

df.orderBy(["first_name", "last_name"]).show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ben|       Du|  1|
|       Ben|       Du|  2|
|       Ben|       Tu|  4|
|       Ben|       Tu|  3|
|       Ken|       Xu|  9|
|       Ken|       Xu|  1|
+----------+---------+---+

Note: The asecending keyword below cannot be omitted!

df.orderBy(["first_name", "last_name"], ascending=[False, False]).show()
+----------+---------+---+
|first_name|last_name| id|
+----------+---------+---+
|       Ken|       Xu|  9|
|       Ken|       Xu|  1|
|       Ben|       Tu|  3|
|       Ben|       Tu|  4|
|       Ben|       Du|  1|
|       Ben|       Du|  2|
+----------+---------+---+