Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Comments

  1. Transformations in torchvision.transforms work on images, tensors (representing images) and possibly on numpy arrays (representing images). However, a transformation (e.g., ToTensor) might work differently on different input types. So you’d be clear about what exactly a transformation function does. A good practice is to always convert your non-tensor input data to tensors using the transformation ToTensor and then apply other transformation functions (which then consumes tensors and produces tensors).

  2. It is always a good idea to normalize your input tensors to be within a small range (e.g., [0, 1]).

import torch
import torchvision
import numpy as np
from PIL import Image
img = Image.open("../../home/media/poker/4h.png")
img
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=37x54 at 0x7F0238065080>
arr = np.array(img)
arr
array([[[ 37, 62, 59], [149, 174, 171], [225, 238, 239], ..., [232, 250, 249], [217, 235, 234], [122, 156, 154]], [[127, 133, 134], [240, 246, 247], [244, 243, 246], ..., [239, 240, 243], [243, 244, 247], [218, 233, 236]], [[152, 158, 159], [245, 251, 252], [237, 236, 239], ..., [235, 236, 239], [235, 236, 239], [227, 242, 245]], ..., [[ 45, 66, 64], [153, 174, 172], [233, 237, 239], ..., [235, 239, 241], [226, 230, 232], [132, 157, 152]], [[ 24, 45, 43], [ 38, 59, 57], [105, 109, 111], ..., [119, 123, 125], [ 92, 96, 98], [ 37, 62, 57]], [[ 25, 55, 50], [ 17, 47, 42], [ 15, 38, 33], ..., [ 16, 40, 40], [ 16, 40, 40], [ 19, 53, 49]]], dtype=uint8)
arr.shape
(54, 37, 3)

torchvision.transforms.ToTensor

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8.

This is the transformation that you alway need when preparing dataset for a computer vision task.

trans = torchvision.transforms.ToTensor()
t1 = trans(img)
t1
tensor([[[0.1451, 0.5843, 0.8824, ..., 0.9098, 0.8510, 0.4784], [0.4980, 0.9412, 0.9569, ..., 0.9373, 0.9529, 0.8549], [0.5961, 0.9608, 0.9294, ..., 0.9216, 0.9216, 0.8902], ..., [0.1765, 0.6000, 0.9137, ..., 0.9216, 0.8863, 0.5176], [0.0941, 0.1490, 0.4118, ..., 0.4667, 0.3608, 0.1451], [0.0980, 0.0667, 0.0588, ..., 0.0627, 0.0627, 0.0745]], [[0.2431, 0.6824, 0.9333, ..., 0.9804, 0.9216, 0.6118], [0.5216, 0.9647, 0.9529, ..., 0.9412, 0.9569, 0.9137], [0.6196, 0.9843, 0.9255, ..., 0.9255, 0.9255, 0.9490], ..., [0.2588, 0.6824, 0.9294, ..., 0.9373, 0.9020, 0.6157], [0.1765, 0.2314, 0.4275, ..., 0.4824, 0.3765, 0.2431], [0.2157, 0.1843, 0.1490, ..., 0.1569, 0.1569, 0.2078]], [[0.2314, 0.6706, 0.9373, ..., 0.9765, 0.9176, 0.6039], [0.5255, 0.9686, 0.9647, ..., 0.9529, 0.9686, 0.9255], [0.6235, 0.9882, 0.9373, ..., 0.9373, 0.9373, 0.9608], ..., [0.2510, 0.6745, 0.9373, ..., 0.9451, 0.9098, 0.5961], [0.1686, 0.2235, 0.4353, ..., 0.4902, 0.3843, 0.2235], [0.1961, 0.1647, 0.1294, ..., 0.1569, 0.1569, 0.1922]]])
t1.shape
torch.Size([3, 54, 37])
t2 = trans(arr)
t2
tensor([[[0.1451, 0.5843, 0.8824, ..., 0.9098, 0.8510, 0.4784], [0.4980, 0.9412, 0.9569, ..., 0.9373, 0.9529, 0.8549], [0.5961, 0.9608, 0.9294, ..., 0.9216, 0.9216, 0.8902], ..., [0.1765, 0.6000, 0.9137, ..., 0.9216, 0.8863, 0.5176], [0.0941, 0.1490, 0.4118, ..., 0.4667, 0.3608, 0.1451], [0.0980, 0.0667, 0.0588, ..., 0.0627, 0.0627, 0.0745]], [[0.2431, 0.6824, 0.9333, ..., 0.9804, 0.9216, 0.6118], [0.5216, 0.9647, 0.9529, ..., 0.9412, 0.9569, 0.9137], [0.6196, 0.9843, 0.9255, ..., 0.9255, 0.9255, 0.9490], ..., [0.2588, 0.6824, 0.9294, ..., 0.9373, 0.9020, 0.6157], [0.1765, 0.2314, 0.4275, ..., 0.4824, 0.3765, 0.2431], [0.2157, 0.1843, 0.1490, ..., 0.1569, 0.1569, 0.2078]], [[0.2314, 0.6706, 0.9373, ..., 0.9765, 0.9176, 0.6039], [0.5255, 0.9686, 0.9647, ..., 0.9529, 0.9686, 0.9255], [0.6235, 0.9882, 0.9373, ..., 0.9373, 0.9373, 0.9608], ..., [0.2510, 0.6745, 0.9373, ..., 0.9451, 0.9098, 0.5961], [0.1686, 0.2235, 0.4353, ..., 0.4902, 0.3843, 0.2235], [0.1961, 0.1647, 0.1294, ..., 0.1569, 0.1569, 0.1922]]])
t2.shape
torch.Size([3, 54, 37])