Tensor Transformations in TorchVision

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Comments¶

Transformations in torchvision.transforms work on images, tensors (representing images) and possibly on numpy arrays (representing images). However, a transformation (e.g., ToTensor) might work differently on different input types. So you’d be clear about what exactly a transformation function does. A good practice is to always convert your non-tensor input data to tensors using the transformation ToTensor and then apply other transformation functions (which then consumes tensors and produces tensors).
It is always a good idea to normalize your input tensors to be within a small range (e.g., [0, 1]).

import torch
import torchvision
import numpy as np
from PIL import Image

img = Image.open("../../home/media/poker/4h.png")
img

<PIL.PngImagePlugin.PngImageFile image mode=RGB size=37x54 at 0x7F0238065080>

arr = np.array(img)
arr

array([[[ 37,  62,  59],
        [149, 174, 171],
        [225, 238, 239],
        ...,
        [232, 250, 249],
        [217, 235, 234],
        [122, 156, 154]],

       [[127, 133, 134],
        [240, 246, 247],
        [244, 243, 246],
        ...,
        [239, 240, 243],
        [243, 244, 247],
        [218, 233, 236]],

       [[152, 158, 159],
        [245, 251, 252],
        [237, 236, 239],
        ...,
        [235, 236, 239],
        [235, 236, 239],
        [227, 242, 245]],

       ...,

       [[ 45,  66,  64],
        [153, 174, 172],
        [233, 237, 239],
        ...,
        [235, 239, 241],
        [226, 230, 232],
        [132, 157, 152]],

       [[ 24,  45,  43],
        [ 38,  59,  57],
        [105, 109, 111],
        ...,
        [119, 123, 125],
        [ 92,  96,  98],
        [ 37,  62,  57]],

       [[ 25,  55,  50],
        [ 17,  47,  42],
        [ 15,  38,  33],
        ...,
        [ 16,  40,  40],
        [ 16,  40,  40],
        [ 19,  53,  49]]], dtype=uint8)

arr.shape

(54, 37, 3)

torchvision.transforms.ToTensor¶

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8.

This is the transformation that you alway need when preparing dataset for a computer vision task.

trans = torchvision.transforms.ToTensor()

t1 = trans(img)
t1

tensor([[[0.1451, 0.5843, 0.8824,  ..., 0.9098, 0.8510, 0.4784],
         [0.4980, 0.9412, 0.9569,  ..., 0.9373, 0.9529, 0.8549],
         [0.5961, 0.9608, 0.9294,  ..., 0.9216, 0.9216, 0.8902],
         ...,
         [0.1765, 0.6000, 0.9137,  ..., 0.9216, 0.8863, 0.5176],
         [0.0941, 0.1490, 0.4118,  ..., 0.4667, 0.3608, 0.1451],
         [0.0980, 0.0667, 0.0588,  ..., 0.0627, 0.0627, 0.0745]],

        [[0.2431, 0.6824, 0.9333,  ..., 0.9804, 0.9216, 0.6118],
         [0.5216, 0.9647, 0.9529,  ..., 0.9412, 0.9569, 0.9137],
         [0.6196, 0.9843, 0.9255,  ..., 0.9255, 0.9255, 0.9490],
         ...,
         [0.2588, 0.6824, 0.9294,  ..., 0.9373, 0.9020, 0.6157],
         [0.1765, 0.2314, 0.4275,  ..., 0.4824, 0.3765, 0.2431],
         [0.2157, 0.1843, 0.1490,  ..., 0.1569, 0.1569, 0.2078]],

        [[0.2314, 0.6706, 0.9373,  ..., 0.9765, 0.9176, 0.6039],
         [0.5255, 0.9686, 0.9647,  ..., 0.9529, 0.9686, 0.9255],
         [0.6235, 0.9882, 0.9373,  ..., 0.9373, 0.9373, 0.9608],
         ...,
         [0.2510, 0.6745, 0.9373,  ..., 0.9451, 0.9098, 0.5961],
         [0.1686, 0.2235, 0.4353,  ..., 0.4902, 0.3843, 0.2235],
         [0.1961, 0.1647, 0.1294,  ..., 0.1569, 0.1569, 0.1922]]])

t1.shape

torch.Size([3, 54, 37])

t2 = trans(arr)
t2

tensor([[[0.1451, 0.5843, 0.8824,  ..., 0.9098, 0.8510, 0.4784],
         [0.4980, 0.9412, 0.9569,  ..., 0.9373, 0.9529, 0.8549],
         [0.5961, 0.9608, 0.9294,  ..., 0.9216, 0.9216, 0.8902],
         ...,
         [0.1765, 0.6000, 0.9137,  ..., 0.9216, 0.8863, 0.5176],
         [0.0941, 0.1490, 0.4118,  ..., 0.4667, 0.3608, 0.1451],
         [0.0980, 0.0667, 0.0588,  ..., 0.0627, 0.0627, 0.0745]],

        [[0.2431, 0.6824, 0.9333,  ..., 0.9804, 0.9216, 0.6118],
         [0.5216, 0.9647, 0.9529,  ..., 0.9412, 0.9569, 0.9137],
         [0.6196, 0.9843, 0.9255,  ..., 0.9255, 0.9255, 0.9490],
         ...,
         [0.2588, 0.6824, 0.9294,  ..., 0.9373, 0.9020, 0.6157],
         [0.1765, 0.2314, 0.4275,  ..., 0.4824, 0.3765, 0.2431],
         [0.2157, 0.1843, 0.1490,  ..., 0.1569, 0.1569, 0.2078]],

        [[0.2314, 0.6706, 0.9373,  ..., 0.9765, 0.9176, 0.6039],
         [0.5255, 0.9686, 0.9647,  ..., 0.9529, 0.9686, 0.9255],
         [0.6235, 0.9882, 0.9373,  ..., 0.9373, 0.9373, 0.9608],
         ...,
         [0.2510, 0.6745, 0.9373,  ..., 0.9451, 0.9098, 0.5961],
         [0.1686, 0.2235, 0.4353,  ..., 0.4902, 0.3843, 0.2235],
         [0.1961, 0.1647, 0.1294,  ..., 0.1569, 0.1569, 0.1922]]])

t2.shape

torch.Size([3, 54, 37])

References¶

https://stackoverflow.com/questions/54268029/how-to-convert-a-pytorch-tensor-into-a-numpy-array