6. Other Computer Vision Problems

I’m reading through the textbook, and I’ve come across something that confused me. Hopefully someone can offer insight.

Context: We’re working with the Biwi Kinect Head Pose dataset, containing a bunch of images comprised of individual people, which will be used as the independent variables. We have defined a method called get_ctr that generates (x,y) pairs representing the center of the person in the image’s head, which will be used as the dependent variables in what will be a regression model.

Below is the DataBlock we construct:

biwi = DataBlock(
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_ctr,
splitter=FuncSplitter(lambda o: o.parent.name==‘13’),
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)
]

And then we construct a DataLoaders object and grab the first batch:

dls = biwi.dataloaders(path)
xb,yb = dls.one_batch()

After doing so, we see that xb is a rank-4 tensor with shape:

torch.Size([64, 3, 240, 320])

I’m not sure why this is the case. I understand that the default batch size is 64, so it’s a list of 64 images (which are 240x320 pixels), but where does the “3” come into play?

I imagine it has something to do with the batch_tfms, specifically, *aug_transforms, but I’m not sure.

Can anyone help explain what the “3” in the tensor shape represents?

Thanks in advance!

3 represents the color channels RGB.

1 Like

Ah! Thank you, that makes sense.