Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
** Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement! **
Comments¶
There is no (direct) way of select all columns except a few from a table using SQL. However, this is easily doable with DataFrame APIs (pandas, Spark/PySpark, etc.).
import pandas as pd
import findspark
findspark.init("/opt/spark-3.0.1-bin-hadoop3.2/")
from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *
from pyspark.sql.types import StructType
spark = SparkSession.builder.appName("Case/When").enableHiveSupport().getOrCreate()df = spark.createDataFrame(
pd.DataFrame(
data=(
("Ben", "Du", 1),
("Ben", "Du", 2),
("Ben", "Tu", 3),
("Ben", "Tu", 4),
("Ken", "Xu", 1),
("Ken", "Xu", 9),
),
columns=("fname", "lname", "score"),
)
)
df.show()+-----+-----+-----+
|fname|lname|score|
+-----+-----+-----+
| Ben| Du| 1|
| Ben| Du| 2|
| Ben| Tu| 3|
| Ben| Tu| 4|
| Ken| Xu| 1|
| Ken| Xu| 9|
+-----+-----+-----+
Select all columns except score.
df.select(*[col for col in df.columns if col not in ("score")]).show()+-----+-----+
|fname|lname|
+-----+-----+
| Ben| Du|
| Ben| Du|
| Ben| Tu|
| Ben| Tu|
| Ken| Xu|
| Ken| Xu|
+-----+-----+