Tips and Traps¶
DataFrame.repartitionrepartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the methodDataFrame.repartition, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed