Tips and Traps¶
DataFrame.repartition
repartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the methodDataFrame.repartition
, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed
Resizing Hard Disk of Guest Machine in Virtualbox
Suppose you have virtual hard disk in VirtualBox called xp.vdi
,
you can resize it (megabytes) using the following command.
VBoxManage modifyhd xp.vdi --resize 40960
The command currently doesn't support vmdk virtual disk.
So if you have a virtual disk called xp.vmdk
,
you have to first convert it …