Different results in different runs on the same data

I have trained and exported a model that works with image data , while trying to do inference with following code :

learn = create_cnn(query_ds, models.resnet50, metrics=accuracy)

produces : 
    [tensor([[0.2794, 0.4179, 0.3028],
             [0.2198, 0.4329, 0.3473],
             [0.1507, 0.6839, 0.1654],
             [0.1670, 0.5620, 0.2710],
             [0.2090, 0.4640, 0.3270],
             [0.1888, 0.5622, 0.2490],
             [0.3274, 0.3213, 0.3513]]), tensor([0, 1, 1, 2, 0, 2, 0])]

    When i run the same block again , i get different scores : 
    [tensor([[0.2329, 0.4419, 0.3252],
             [0.2527, 0.4269, 0.3203],
             [0.3447, 0.3332, 0.3220],
             [0.3269, 0.4221, 0.2510],
             [0.3003, 0.3916, 0.3081],
             [0.4228, 0.3124, 0.2648],
             [0.3449, 0.4089, 0.2462]]), tensor([0, 1, 1, 2, 0, 2, 0])]

I have also tried the data block tutorial for digit prediction . Here also the prediction probabilities changes after i load the same model and run inference multiple times . For example ,

learn = load_learner(mnist)
img = data.train_ds[1][0]

it produces : 
(Category 7, tensor(1), tensor([0.0628, 0.9372]))
But i run the same block again it produces : 
(Category 7, tensor(1), tensor([0.0170, 0.9830]))

Can you please let me know the reason behind this. Ideally during inference the weights should be frozen and hence the vector values should remain same at each run. There is big difference in the final confidence scores. I wanted to use the scores for embedding search but because of this difference in final embeddings for the same image, i am facing issues.

Kindly suggest a way to solve this issue.


My guess is either that is not same image every time or you have a random transform in your data loader. Try printing learn.data.valid_dl

No I don’t have any transformation in my data.
The data bunch creation step is

tfms = get_transforms()
data = (ImageList.from_folder(mnist)
.transform(tfms, size=32)

The images in each run are exactly same :

learn = load_learner(mnist)
img = data.train_ds[1][0]
print(data.train_ds.items[0], data.train_ds[1])

The outcome of the above code’s run is :

/home/ubuntu/.fastai/data/mnist_tiny/train/7/7010.png (Image (3, 32, 32), Category 7)
(Category 7, tensor(1), tensor([0.0664, 0.9336]))
/home/ubuntu/.fastai/data/mnist_tiny/train/7/7010.png (Image (3, 32, 32), Category 7)
(Category 7, tensor(1), tensor([0.0810, 0.9190]))

Which proves that the image is same but the probabilities are different across each run.
@jeremy @rachel @sgugger

.transform(tfms, size=32)

So you do have transforms :P. In particulat, get_transforms gives crop_pad transform to the validation set which is what will also run for predict.

Which proves that the image is same but the probabilities are different across each run.

Probabilities are indeed different, but you can’t claim image is the same. We can only know same image was sent as input, not if its the same image after transforms. To check that, you should either show the image (learn.predict(img)[0].show()) or check that their tensors (learn.predict(img)[0].px) are the same.

1 Like

Thanks the transformation was the catch. After applying a fixed set of transformations i am getting the same outcome.
Thanks for pointing it.

But how do you ensure a that the embedding generated in successful run are same at the inference time.

Not sure what you mean by that

Let me explain the situation :
I have trained the model with random transformations while training. This gives pretty nice results.
Now , i am using the model for indexing images by generating embeddings from the intermediate layer.
However the problem is , the same image is producing two different vectors because of the random transformation captured in the exported model.
I have to get rid of this randomness at the inference time so that the same image produce identical embeddings any time it is forwarded through the model.
I tried setting np.random.seeds() but its not helping out.

At inference model should be deterministic, you should check if you still have other transformations. Hard to say without your code. The transformations used for training don’t matter, only the ones used on validation which are the ones that will run on inference