Sample Rows from a Spark DataFrame
Tips and Traps¶
- TABLESAMPLEmust be immedidately after a table name.
- The - WHEREclause in the following SQL query runs after- TABLESAMPLE.- SELECT * FROM table_name TABLESAMPLE (10 PERCENT) WHERE id = 1- If you want to run a - WHERE
Union DataFrames in Spark
Comment¶
- unionrelies on column order rather than column names. This is the same as in SQL. For columns that the type don't match, the super type is used. However, this is really dangerous if you are careful. It is suggested that you define a function call unionByName to hanle this.