Tips and Traps¶
DataFrame.repartitionrepartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the methodDataFrame.repartition, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed
Resizing Hard Disk of Guest Machine in Virtualbox
Suppose you have virtual hard disk in VirtualBox called xp.vdi,
you can resize it (megabytes) using the following command.
VBoxManage modifyhd xp.vdi --resize 40960
The command currently doesn't support vmdk virtual disk.
So if you have a virtual disk called xp.vmdk,
you have to first convert it …