I’m reading through the textbook, and I’ve come across something that confused me. Hopefully someone can offer insight.
Context: We’re working with the Biwi Kinect Head Pose dataset, containing a bunch of images comprised of individual people, which will be used as the independent variables. We have defined a method called get_ctr
that generates (x,y) pairs representing the center of the person in the image’s head, which will be used as the dependent variables in what will be a regression model.
Below is the DataBlock we construct:
biwi = DataBlock(
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_ctr,
splitter=FuncSplitter(lambda o: o.parent.name==‘13’),
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)]
And then we construct a DataLoaders object and grab the first batch:
dls = biwi.dataloaders(path)
xb,yb = dls.one_batch()
After doing so, we see that xb
is a rank-4 tensor with shape:
torch.Size([64, 3, 240, 320])
I’m not sure why this is the case. I understand that the default batch size is 64, so it’s a list of 64 images (which are 240x320 pixels), but where does the “3” come into play?
I imagine it has something to do with the batch_tfms
, specifically, *aug_transforms
, but I’m not sure.
Can anyone help explain what the “3” in the tensor shape represents?
Thanks in advance!