How exactly does Learner.predict load images?

I’m having trouble replicating learn.predict() in pure PyTorch/Numpy. I’m trying this on a custom dataset. I need to be able to replicate this without any fastai code involved

Input Image:

Here’s an example of the different results I get. My Learner object is called framing

Fastai results

x = open_image(f)
pred = framing.predict(x)
print(pred[0])
# my loss function doesn't have a default mapping to an activation 
# i.e. LabelSmoothingCrossEntropy(), thus the softmax here
print(nn.Softmax(-1)(pred[2]))

Output:

mediumclose
tensor([0.1246, 0.0643, 0.0082, 0.0218, 0.0705, 0.7106])

PyTorch results

Now when I try to do the same with PyTorch as such:

from PIL import Image as PILImg
import torchvision.transforms.functional as TTF
x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)

pred = framing.model(x)
pred_sm = nn.Softmax(-1)(pred)
framing.data.classes[torch.argmax(pred_sm)]
pred_sm

Output:

'extremeclose'
tensor([[0.2260, 0.4089, 0.1431, 0.0688, 0.0756, 0.0776]], device='cuda:0',
       grad_fn=<SoftmaxBackward>)

The predictions in PyTorch are very far off from the ones in fastai. Sometimes they give the same prediction as fastai but with much lower confidence.
After looking at the source code, I figured that there’s 2 key things I’m missing, both in the preprocessing pipeline.

1.

In the first line of Learner.predict's source code:

batch = self.data.one_item(item)

This translates to

x = open_image(f)
batch = framing.data.one_item(x)
batch[0].shape
batch[0]

Output:

torch.Size([1, 3, 224, 224])
tensor([[[[-1.9723, -1.3375, -1.2788,  ..., -1.1223, -1.2103, -1.4843],
          [-1.9723, -1.3375, -1.2788,  ..., -1.1150, -1.2103, -1.4357],
          [-1.9723, -1.3375, -1.2788,  ..., -1.1034, -1.2103, -1.4373],
          ...,
          [-1.9722, -1.1860, -1.1060,  ..., -1.8340, -1.8476, -1.9638],
          [-1.9524, -1.2139, -1.0834,  ..., -1.8341, -1.8476, -1.9638],
          [-2.0023, -1.0501, -1.1437,  ..., -1.8525, -1.8268, -1.9467]],

         [[-1.9394, -1.4254, -1.4580,  ..., -1.4055, -1.4329, -1.6243],
          [-1.9394, -1.4254, -1.4580,  ..., -1.3980, -1.4329, -1.5747],
          [-1.9394, -1.4254, -1.4580,  ..., -1.3861, -1.4329, -1.5762],
# .....

This itself is very different from the PyTorch Tensor:

x = PILImg.open(f).convert('RGB')
x = TTF.resize(x, (224,224))
x = TTF.to_tensor(x)
x = x.cuda().unsqueeze_(0)
x.shape
x
torch.Size([1, 3, 224, 224])
tensor([[[[0.1137, 0.1843, 0.1882,  ..., 0.2353, 0.2157, 0.1882],
          [0.1137, 0.1843, 0.1882,  ..., 0.2392, 0.2157, 0.1922],
          [0.1137, 0.1843, 0.1882,  ..., 0.2431, 0.2157, 0.1922],
          ...,
          [0.1294, 0.2235, 0.2275,  ..., 0.0627, 0.0667, 0.0510],
          [0.1294, 0.2235, 0.2275,  ..., 0.0627, 0.0627, 0.0510],
          [0.1333, 0.2314, 0.2275,  ..., 0.0627, 0.0627, 0.0510]],

         [[0.0902, 0.1333, 0.1294,  ..., 0.1451, 0.1373, 0.1255],
          [0.0902, 0.1333, 0.1294,  ..., 0.1490, 0.1373, 0.1294],
          [0.0902, 0.1333, 0.1294,  ..., 0.1529, 0.1373, 0.1294],

2.

Inside the code for Learner.predict later, there’s a denormalising operation that’s happening which, if i understand correctly, is denormalising the image with imagenet_stats.

What I’m unable to replicate without fastai

I can’t reproduce what’s happening in batch = framing.data.one_item(x). From the source code, of DataBunch.one_item and DataBunch.one_batch the operation translates to:

x = open_image(f)
ds = framing.data.single_ds
with ds.set_item(Image(x)):
    dl = framing.data.dl(DatasetType.Single)
    w = dl.num_workers
    dl.num_workers = 0
    try: x,y = next(iter(dl))
    finally: dl.num_workers = w
# ...

If I follow correctly, with ds.set_item(Image(x)) calls a context manager which in turn calls framing.data.processor (which is empty), but I’m simply unable to reproduce this.

Any help will be much appreciated!! Thank you.

cc @sgugger @muellerzr

1 Like

Hello, does someone manage to understand the tweaks behind Learner.predict?
I also try to replicate the code in pure PyTorch for U-Net inference, but fail.

If I follow correctly, Learner.predict:

  1. Converts an image to tensor
  2. adds one more dimension to the network input:
    For RGB image from [3, 224, 224] to [1, 3, 224, 224]
  3. places the tensor on GPU
  4. optionally modifies the last layer of the Network to SoftMax.

I have pre-processed images with steps 0 - 2, while avoiding step 3 (Sigmoid is needed for U-Net segmentation and not SoftMax (?)). Then loaded with learn.model() and compared to learn.predict() with unprocessed data.
Unfortunately, the results never matched. Moreover network produced unexpected negative values outside of Sigmoid range [0, 1].

Thank you in advance for your help!

1 Like

Did you ever find a solution for this? I’m having a very similar problem in loading images using NumPy for prediction using onnx. If I use the learn.dls.test_dl I get the correct results back but I’m unable to match these results using pure NumPy/Pillow to load images.

Code to open_image() is here: fastai1/image.py at a8327427ad5137c4899a1b4f74745193c9ea5be3 · fastai/fastai1 · GitHub

It seems in fastai, that there is a
if div: x.div_(255)
that may be missing in the Pytorch results version

hi @vmazlin,

am currently working with UNETs and facing the same issue. Just wanted to know if you were able to find the solution to this and what is it that fastai does in the backend which is different from the usual model.predict method?

thanks

FastAI wraps the PyTorch model with an additional layer for convenience - Softmax, Normalization, and other transformations (defined in FastAI DataBlock API). We have to manually define those when using the native PyTorch. Else I’ll be getting weird results.

Check out this amazing Notebook
and this Blog. Hope this will solve your issue.

Hi @rsomani95 ,

I have the exact same problem. Did you ever find a solution to how to replicate the same result in Pytorch?

It’s been a while, but I figured out the missing piece in my original post. I wasn’t doing the ImageNet normalisation. The TTF.to_tensor(x) call normalises from 0-1, but we also need to normalise using imagenet stats