I’m having trouble replicating learn.predict() in pure PyTorch/Numpy. I’m trying this on a custom dataset. I need to be able to replicate this without any fastai code involved
Input Image:
Here’s an example of the different results I get. My Learner object is called framing
Fastai results
x = open_image(f)
pred = framing.predict(x)
print(pred[0])
# my loss function doesn't have a default mapping to an activation
# i.e. LabelSmoothingCrossEntropy(), thus the softmax here
print(nn.Softmax(-1)(pred[2]))
Output:
mediumclose
tensor([0.1246, 0.0643, 0.0082, 0.0218, 0.0705, 0.7106])
PyTorch results
Now when I try to do the same with PyTorch as such:
from PIL import Image as PILImg
import torchvision.transforms.functional as TTF
x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)
pred = framing.model(x)
pred_sm = nn.Softmax(-1)(pred)
framing.data.classes[torch.argmax(pred_sm)]
pred_sm
Output:
'extremeclose'
tensor([[0.2260, 0.4089, 0.1431, 0.0688, 0.0756, 0.0776]], device='cuda:0',
grad_fn=<SoftmaxBackward>)
The predictions in PyTorch are very far off from the ones in fastai. Sometimes they give the same prediction as fastai but with much lower confidence.
After looking at the source code, I figured that there’s 2 key things I’m missing, both in the preprocessing pipeline.
1.
In the first line of Learner.predict's source code:
batch = self.data.one_item(item)
This translates to
x = open_image(f)
batch = framing.data.one_item(x)
batch[0].shape
batch[0]
Output:
torch.Size([1, 3, 224, 224])
tensor([[[[-1.9723, -1.3375, -1.2788, ..., -1.1223, -1.2103, -1.4843],
[-1.9723, -1.3375, -1.2788, ..., -1.1150, -1.2103, -1.4357],
[-1.9723, -1.3375, -1.2788, ..., -1.1034, -1.2103, -1.4373],
...,
[-1.9722, -1.1860, -1.1060, ..., -1.8340, -1.8476, -1.9638],
[-1.9524, -1.2139, -1.0834, ..., -1.8341, -1.8476, -1.9638],
[-2.0023, -1.0501, -1.1437, ..., -1.8525, -1.8268, -1.9467]],
[[-1.9394, -1.4254, -1.4580, ..., -1.4055, -1.4329, -1.6243],
[-1.9394, -1.4254, -1.4580, ..., -1.3980, -1.4329, -1.5747],
[-1.9394, -1.4254, -1.4580, ..., -1.3861, -1.4329, -1.5762],
# .....
This itself is very different from the PyTorch Tensor:
x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)
x.shape
x
torch.Size([1, 3, 224, 224])
tensor([[[[0.1137, 0.1843, 0.1882, ..., 0.2353, 0.2157, 0.1882],
[0.1137, 0.1843, 0.1882, ..., 0.2392, 0.2157, 0.1922],
[0.1137, 0.1843, 0.1882, ..., 0.2431, 0.2157, 0.1922],
...,
[0.1294, 0.2235, 0.2275, ..., 0.0627, 0.0667, 0.0510],
[0.1294, 0.2235, 0.2275, ..., 0.0627, 0.0627, 0.0510],
[0.1333, 0.2314, 0.2275, ..., 0.0627, 0.0627, 0.0510]],
[[0.0902, 0.1333, 0.1294, ..., 0.1451, 0.1373, 0.1255],
[0.0902, 0.1333, 0.1294, ..., 0.1490, 0.1373, 0.1294],
[0.0902, 0.1333, 0.1294, ..., 0.1529, 0.1373, 0.1294],
2.
Inside the code for Learner.predict later, there’s a denormalising operation that’s happening which, if i understand correctly, is denormalising the image with imagenet_stats.
What I’m unable to replicate without fastai
I can’t reproduce what’s happening in batch = framing.data.one_item(x). From the source code, of DataBunch.one_item and DataBunch.one_batch the operation translates to:
x = open_image(f)
ds = framing.data.single_ds
with ds.set_item(Image(x)):
dl = framing.data.dl(DatasetType.Single)
w = dl.num_workers
dl.num_workers = 0
try: x,y = next(iter(dl))
finally: dl.num_workers = w
# ...
If I follow correctly, with ds.set_item(Image(x)) calls a context manager which in turn calls framing.data.processor (which is empty), but I’m simply unable to reproduce this.
Any help will be much appreciated!! Thank you.
