Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Modules can hold parameters of different types on different devices, so it’s not always possible to unambiguously determine the device. The recommended workflow in PyTorch is to create the device object separately and use that everywhere. However, if you know that all the parameters in a model are on the same device, you can use
next(model.parameters()).deviceto get the device. In that situation, you can also usenext(model.parameters()).is_cudato check if the model is on CUDA.It is suggested that you use use method
.toto move a model/tensor to a specific device.:::python model.to("cuda") tensor = tensor.to("cpu")Notice that
Module.tois in-place whileTensor.toreturns a copy!
Function for Managing Device¶
torch
torch.cuda.device Context-manager that changes the selected device.
torch
torch
torch
Use Multiple GPUs on the Same Machine¶
Below is a typical pattern of code to train/run your model on multiple GPUs.
:::python
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
model = torch.nn.DataParallel(model)
model(data)torch.nn.DataParallelparallels a model on GPU devices only. It doesn’t matter which device the data is on if the model is wrapped bytorch.nn.DataParallel. It can be on a CPU or any GPU device. It will get splitted and distributed to all GPU devices anyway.If GPU devices have different capabilities, it is best to have the most powerful GPU device as device 0.
Does DataParallel matters in CPU-mode
My recurrent network doesn’t work with data parallelism
Use Multiple Processes or GPUs on Different Machines¶
https://
Similar to
torch.nn.DataParallel,torch.nn.DistributedDataParallelworks for GPU only.It is suggested that you spawn multiple processes (on each node) and have each process operate a single GPU.
ncclis the suggested backend to use. If not available, then use thegloobackend.If you use torch.save on one process to checkpoint the module, and
torch.loadon some other processes to recover it, make sure that map_location is configured properly for every process. Withoutmap_location,torch.loadwould recover the module to devices where the module was saved from.
https://
References¶
[Feature Request] nn.Module should also get a device attribute
Which device is model / tensor stored on?
How to get the device type of a pytorch module conveniently?
[Feature Request] nn.Module should also get a device attribute #7460