I’m having trouble replicating learn.predict()
in pure PyTorch/Numpy. I’m trying this on a custom dataset. I need to be able to replicate this without any fastai
code involved
Input Image:
Here’s an example of the different results I get. My Learner
object is called framing
Fastai
results
x = open_image(f)
pred = framing.predict(x)
print(pred[0])
# my loss function doesn't have a default mapping to an activation
# i.e. LabelSmoothingCrossEntropy(), thus the softmax here
print(nn.Softmax(-1)(pred[2]))
Output:
mediumclose
tensor([0.1246, 0.0643, 0.0082, 0.0218, 0.0705, 0.7106])
PyTorch results
Now when I try to do the same with PyTorch as such:
from PIL import Image as PILImg
import torchvision.transforms.functional as TTF
x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)
pred = framing.model(x)
pred_sm = nn.Softmax(-1)(pred)
framing.data.classes[torch.argmax(pred_sm)]
pred_sm
Output:
'extremeclose'
tensor([[0.2260, 0.4089, 0.1431, 0.0688, 0.0756, 0.0776]], device='cuda:0',
grad_fn=<SoftmaxBackward>)
The predictions in PyTorch
are very far off from the ones in fastai
. Sometimes they give the same prediction as fastai
but with much lower confidence.
After looking at the source code, I figured that there’s 2 key things I’m missing, both in the preprocessing pipeline.
1.
In the first line of Learner.predict
's source code:
batch = self.data.one_item(item)
This translates to
x = open_image(f)
batch = framing.data.one_item(x)
batch[0].shape
batch[0]
Output:
torch.Size([1, 3, 224, 224])
tensor([[[[-1.9723, -1.3375, -1.2788, ..., -1.1223, -1.2103, -1.4843],
[-1.9723, -1.3375, -1.2788, ..., -1.1150, -1.2103, -1.4357],
[-1.9723, -1.3375, -1.2788, ..., -1.1034, -1.2103, -1.4373],
...,
[-1.9722, -1.1860, -1.1060, ..., -1.8340, -1.8476, -1.9638],
[-1.9524, -1.2139, -1.0834, ..., -1.8341, -1.8476, -1.9638],
[-2.0023, -1.0501, -1.1437, ..., -1.8525, -1.8268, -1.9467]],
[[-1.9394, -1.4254, -1.4580, ..., -1.4055, -1.4329, -1.6243],
[-1.9394, -1.4254, -1.4580, ..., -1.3980, -1.4329, -1.5747],
[-1.9394, -1.4254, -1.4580, ..., -1.3861, -1.4329, -1.5762],
# .....
This itself is very different from the PyTorch Tensor:
x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)
x.shape
x
torch.Size([1, 3, 224, 224])
tensor([[[[0.1137, 0.1843, 0.1882, ..., 0.2353, 0.2157, 0.1882],
[0.1137, 0.1843, 0.1882, ..., 0.2392, 0.2157, 0.1922],
[0.1137, 0.1843, 0.1882, ..., 0.2431, 0.2157, 0.1922],
...,
[0.1294, 0.2235, 0.2275, ..., 0.0627, 0.0667, 0.0510],
[0.1294, 0.2235, 0.2275, ..., 0.0627, 0.0627, 0.0510],
[0.1333, 0.2314, 0.2275, ..., 0.0627, 0.0627, 0.0510]],
[[0.0902, 0.1333, 0.1294, ..., 0.1451, 0.1373, 0.1255],
[0.0902, 0.1333, 0.1294, ..., 0.1490, 0.1373, 0.1294],
[0.0902, 0.1333, 0.1294, ..., 0.1529, 0.1373, 0.1294],
2.
Inside the code for Learner.predict
later, there’s a denormalising operation that’s happening which, if i understand correctly, is denormalising the image with imagenet_stats
.
What I’m unable to replicate without fastai
I can’t reproduce what’s happening in batch = framing.data.one_item(x)
. From the source code, of DataBunch.one_item
and DataBunch.one_batch
the operation translates to:
x = open_image(f)
ds = framing.data.single_ds
with ds.set_item(Image(x)):
dl = framing.data.dl(DatasetType.Single)
w = dl.num_workers
dl.num_workers = 0
try: x,y = next(iter(dl))
finally: dl.num_workers = w
# ...
If I follow correctly, with ds.set_item(Image(x))
calls a context manager which in turn calls framing.data.processor
(which is empty), but I’m simply unable to reproduce this.
Any help will be much appreciated!! Thank you.