Device Managment in PyTorch

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Modules can hold parameters of different types on different devices, so it’s not always possible to unambiguously determine the device. The recommended workflow in PyTorch is to create the device object separately and use that everywhere. However, if you know that all the parameters in a model are on the same device, you can use next(model.parameters()).device to get the device. In that situation, you can also use next(model.parameters()).is_cuda to check if the model is on CUDA.
It is suggested that you use use method .to to move a model/tensor to a specific device.
```
 :::python
 model.to("cuda")
 tensor = tensor.to("cpu")
```
Notice that Module.to is in-place while Tensor.to returns a copy!

Function for Managing Device¶

torch.cuda.current_device Returns the index of a currently selected device.

torch.cuda.device Context-manager that changes the selected device.

torch.cuda.device_count Returns the number of GPUs on the machine (no matter whether they are busy or not).

torch.cuda.device_of Context-manager that changes the current device to that of given object.

torch.cuda.get_device_capability Gets the cuda capability of a device.

Below is a typical pattern of code to train/run your model on multiple GPUs.

:::python
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
model = torch.nn.DataParallel(model)
model(data)

torch.nn.DataParallel parallels a model on GPU devices only. It doesn’t matter which device the data is on if the model is wrapped by torch.nn.DataParallel. It can be on a CPU or any GPU device. It will get splitted and distributed to all GPU devices anyway.
If GPU devices have different capabilities, it is best to have the most powerful GPU device as device 0.

Similar to torch.nn.DataParallel, torch.nn.DistributedDataParallel works for GPU only.
It is suggested that you spawn multiple processes (on each node) and have each process operate a single GPU.
nccl is the suggested backend to use. If not available, then use the gloo backend.
If you use torch.save on one process to checkpoint the module, and torch.load on some other processes to recover it, make sure that map_location is configured properly for every process. Without map_location, torch.load would recover the module to devices where the module was saved from.