Ben Chuanlong Du's Blog

It is never too late to learn.

Tensor Transformations in TorchVision

Comments

  1. Transformations in torchvision.transforms work on images, tensors (representing images) and possibly on numpy arrays (representing images). However, a transformation (e.g., ToTensor) might work differently on different input types. So you'd be clear about what exactly a transformation function does. A good practice is to always convert your non-tensor input data to tensors using the transformation ToTensor and then apply other transformation functions (which then consumes tensors and produces tensors).

  2. It is always a good idea to normalize your input tensors to be within a small range (e.g., [0, 1]).

In [19]:
import torch
import torchvision
import numpy as np
from PIL import Image
In [20]:
img = Image.open("../../home/media/poker/4h.png")
img
Out[20]:
In [22]:
arr = np.array(img)
arr
Out[22]:
array([[[ 37,  62,  59],
        [149, 174, 171],
        [225, 238, 239],
        ...,
        [232, 250, 249],
        [217, 235, 234],
        [122, 156, 154]],

       [[127, 133, 134],
        [240, 246, 247],
        [244, 243, 246],
        ...,
        [239, 240, 243],
        [243, 244, 247],
        [218, 233, 236]],

       [[152, 158, 159],
        [245, 251, 252],
        [237, 236, 239],
        ...,
        [235, 236, 239],
        [235, 236, 239],
        [227, 242, 245]],

       ...,

       [[ 45,  66,  64],
        [153, 174, 172],
        [233, 237, 239],
        ...,
        [235, 239, 241],
        [226, 230, 232],
        [132, 157, 152]],

       [[ 24,  45,  43],
        [ 38,  59,  57],
        [105, 109, 111],
        ...,
        [119, 123, 125],
        [ 92,  96,  98],
        [ 37,  62,  57]],

       [[ 25,  55,  50],
        [ 17,  47,  42],
        [ 15,  38,  33],
        ...,
        [ 16,  40,  40],
        [ 16,  40,  40],
        [ 19,  53,  49]]], dtype=uint8)
In [23]:
arr.shape
Out[23]:
(54, 37, 3)

torchvision.transforms.ToTensor

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8.

This is the transformation that you alway need when preparing dataset for a computer vision task.

In [24]:
trans = torchvision.transforms.ToTensor()
In [28]:
t1 = trans(img)
t1
Out[28]:
tensor([[[0.1451, 0.5843, 0.8824,  ..., 0.9098, 0.8510, 0.4784],
         [0.4980, 0.9412, 0.9569,  ..., 0.9373, 0.9529, 0.8549],
         [0.5961, 0.9608, 0.9294,  ..., 0.9216, 0.9216, 0.8902],
         ...,
         [0.1765, 0.6000, 0.9137,  ..., 0.9216, 0.8863, 0.5176],
         [0.0941, 0.1490, 0.4118,  ..., 0.4667, 0.3608, 0.1451],
         [0.0980, 0.0667, 0.0588,  ..., 0.0627, 0.0627, 0.0745]],

        [[0.2431, 0.6824, 0.9333,  ..., 0.9804, 0.9216, 0.6118],
         [0.5216, 0.9647, 0.9529,  ..., 0.9412, 0.9569, 0.9137],
         [0.6196, 0.9843, 0.9255,  ..., 0.9255, 0.9255, 0.9490],
         ...,
         [0.2588, 0.6824, 0.9294,  ..., 0.9373, 0.9020, 0.6157],
         [0.1765, 0.2314, 0.4275,  ..., 0.4824, 0.3765, 0.2431],
         [0.2157, 0.1843, 0.1490,  ..., 0.1569, 0.1569, 0.2078]],

        [[0.2314, 0.6706, 0.9373,  ..., 0.9765, 0.9176, 0.6039],
         [0.5255, 0.9686, 0.9647,  ..., 0.9529, 0.9686, 0.9255],
         [0.6235, 0.9882, 0.9373,  ..., 0.9373, 0.9373, 0.9608],
         ...,
         [0.2510, 0.6745, 0.9373,  ..., 0.9451, 0.9098, 0.5961],
         [0.1686, 0.2235, 0.4353,  ..., 0.4902, 0.3843, 0.2235],
         [0.1961, 0.1647, 0.1294,  ..., 0.1569, 0.1569, 0.1922]]])
In [29]:
t1.shape
Out[29]:
torch.Size([3, 54, 37])
In [30]:
t2 = trans(arr)
t2
Out[30]:
tensor([[[0.1451, 0.5843, 0.8824,  ..., 0.9098, 0.8510, 0.4784],
         [0.4980, 0.9412, 0.9569,  ..., 0.9373, 0.9529, 0.8549],
         [0.5961, 0.9608, 0.9294,  ..., 0.9216, 0.9216, 0.8902],
         ...,
         [0.1765, 0.6000, 0.9137,  ..., 0.9216, 0.8863, 0.5176],
         [0.0941, 0.1490, 0.4118,  ..., 0.4667, 0.3608, 0.1451],
         [0.0980, 0.0667, 0.0588,  ..., 0.0627, 0.0627, 0.0745]],

        [[0.2431, 0.6824, 0.9333,  ..., 0.9804, 0.9216, 0.6118],
         [0.5216, 0.9647, 0.9529,  ..., 0.9412, 0.9569, 0.9137],
         [0.6196, 0.9843, 0.9255,  ..., 0.9255, 0.9255, 0.9490],
         ...,
         [0.2588, 0.6824, 0.9294,  ..., 0.9373, 0.9020, 0.6157],
         [0.1765, 0.2314, 0.4275,  ..., 0.4824, 0.3765, 0.2431],
         [0.2157, 0.1843, 0.1490,  ..., 0.1569, 0.1569, 0.2078]],

        [[0.2314, 0.6706, 0.9373,  ..., 0.9765, 0.9176, 0.6039],
         [0.5255, 0.9686, 0.9647,  ..., 0.9529, 0.9686, 0.9255],
         [0.6235, 0.9882, 0.9373,  ..., 0.9373, 0.9373, 0.9608],
         ...,
         [0.2510, 0.6745, 0.9373,  ..., 0.9451, 0.9098, 0.5961],
         [0.1686, 0.2235, 0.4353,  ..., 0.4902, 0.3843, 0.2235],
         [0.1961, 0.1647, 0.1294,  ..., 0.1569, 0.1569, 0.1922]]])
In [31]:
t2.shape
Out[31]:
torch.Size([3, 54, 37])
In [ ]:
 

Comments