Difference between "open_image" and "plt.imread" methods

(Isa Al-Doseri) #1

I used a pretrained Resnet34 model to classify images of 10 species of monkeys that I obtained from a Kaggle Dataset
Even though I achieved an accuracy of 99.6% on the validation (272 images), I still got incorrect labels for every single image that I predicted individually.

print(df[df.Label==data.classes[np.argmax(preds)]]['Common Name'])

I initially used the plt.imread() method to read the numpy.array of the image. This was then transformed using val_tfms(as it normally should) and fed into the model for prediction. When I replaced that with fast ai’s open_image() function instead…


I got every label right. So what’s the difference between the two?

(Cedric Chee) #2

open_image() function opens an image using OpenCV given the file path and returns the image in RGB format as numpy array of floats NORMALIZED to range between 0.0 - 1.0.

plt.imread() DOES NOT normalize the image.

(Isa Al-Doseri) #3

Ahh I see, so the model was trained with normalized images. Is this a necessary preprocessing step for the specific model (Resnet34) I’m using, or just to train the last few layers? Am I to understand that image normalization is used to train all or most of the pre-trained models in Fast AI?
Please ask me to be clear in my questions if I’m not.

(Cedric Chee) #4

Good to know that I answered your question.

It’s not specific to any model or a specific case like yours where you fine-tune the last few layers. It’s a common technique (or if you will, good practice). We do this before we hand-over the images to the image transformation/augmentation pipeline (tfms).

It depends on the how the original pre-trained model handle the images in the datasets.

I think you got confused between single image normalization and dataset normalization. I have explained what is single image normalization previously (the one where you invoke open_image).

Dataset normalization is where you have the statistics of the training image datasets given by the model (e.g pre-trained ImageNet, Inception model, etc). Transformers are constructed according to that. All these are usually happened behind the scene in fastai lib. To illustrate, let’s take an example. This is how things looks like in code:

# Statistics pertaining to image data from ImageNet: mean and standard deviation of the images of each color channel
imagenet_stats = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
imagenet_stats_arr = np.array(imagenet_stats[0]) if len(imagenet_stats)==1 else [np.array(o) for o in imagenet_stats]

# Datasets normalization.
tfms_from_model(function_model=vgg16, norm_y=True, ...)
# The above function will internally call `tfms_from_stats()`
tfms_from_stats(imagenet_stats, norm_y=True)
# Then, the above function will call PyTorch Normalize class
tfm_norm = Normalize(*imagenet_stats, tfm_y=tfm_y if norm_y else TfmType.NO)
# Normalize class mormalizes an image to zero mean and unit standard deviation, given the mean and std of the original image

(Isa Al-Doseri) #5

I get it now, thanks!

But I think the only question remaining is what drove the developers to write the functions this way. Why is it necessary to normalize the read image to the range between 0.0 - 1.0 (Single Normalization) when calling open_image, and then re-normalize it with the given statistics of the specific model using val_tfms. Why carry it out as two separate operations? You said fast ai does all of this under then hood, so why not just read an image from file and normalize it according to the training datasets using just one function? (in this case val_tfsm would be appropriate enough). Excuse my verbosity!

(Cedric Chee) #6

A good explanation without going so much into the theories, what motivated this, according to this StackOverflow answer:

It’s simply a case of getting all your data on the same scale: if the scales for different features are wildly different, this can have a knock-on effect on your ability to learn (depending on what methods you’re using to do it). Ensuring standardised feature values implicitly weights all features equally in their representation.

(Isa Al-Doseri) #7

No, I think you misunderstand. My question was why there were two normalization functions developed when only one (val_tfms) would have sufficed. Not expecting an answer to this, but anything would do from you :smiley: