Single_from_classes has a loss in accuracy from the same model

Hi all,

I wrote a model with data (following lessons 1-3) and it has an inaccuracy of 1.3%. As a result, to test it for production, I followed the data2 steps,
Data1:
data = ImageDataBunch.from_folder(path, valid_pct=0.1, ds_tfms=get_transforms(), size=64)

Data2:
data2 = ImageDataBunch.single_from_classes(path, classes, tfms=get_transforms(), size=224).normalize(imagenet_stats)

and then did my learn,
learn = create_cnn(data2, models.resnet34).load('productionModel_rs_34')

Image Predition:
pred_class,pred_idx,outputs = learn.predict(img) pred_class

However the accuracy drops exponentially as out of my 44 classes it can only identify about 8. Whereas if I use the original ImageDataBunch it will successfully identify them. What is happening?

Another thing I noticed is a from_folder model file is ~250Mb or so but a single_from_classes model is 83Mb. Should I be concerned that something is being lost there?

Thanks!
Also I am using Google Colab for my Notebook

Little update, I tried recreating the issue in another notebook using Paperspace, same result. I also tried when following the same steps from lesson 2 with only two classes, a real vs stuffed animal wolf and again had this error happen (even when model claimed it had .001% error)

That definitely sounds confusing!

I’d expect creating a CNN and then loading the model to give the same predictions independent of the data associated with it.

It’s really hard to tell what’s going on from the few lines you showed and the problem is likely in the other lines.

Can you share a notebook with a Minimal Reproducible Example of this? The real vs. stuffed wolf sounds like a good candidate.

It’ll make it easier for people to help you (and may even help you find the answer yourself).

The only thing I can think of is it could have to do with how you’re selecting your test images. A 0.001% error sounds suspiciously low for an image downloaded dataset and I’d check carefully you’re not accidentally calculating your accuracy using some images from the training set (this is easy to do accidentally if you reload the data with a valid_pct any time during training).

That MAY be my issue. So lets say I were trying some different data sizes, eg size 128 to 256 to help improve accuracy. Each time I do a new data = ImageDataBunch.from_folder(path, valid_pct=0.1, ds_tfms=get_transforms(), size=128)

Where the only thing I change is the size=128
Are you suggesting if I run data again, don’t include the valid_pct value? And I will also have that notebook up in a moment.

Thank you very much Edward!

I tried again without reloading the valid_pct at ALL just to be sure and same result with the wolves. Here is the ipynb on Google Colab:

https://colab.research.google.com/drive/1syYUe4Kq-uDO896ecR090TbybMPw9RsH

That is the test image I used.

Every time you run valid_pct it uses a random number generator to pick some random samples.

If you don’t add/remove/modify any image files in the folder you can ensure you get the same result by setting a random seed (you do this when you create data).

So you should get the same results when running:

np.random.seed(42) # same seed from data
data2 = ImageDataBunch.single_from_classes(path, classes, tfms=get_transforms(), size=224).normalize(imagenet_stats)

Probably the safest thing to do is to move your data into separate train and valid folders.

It seems like an easy problem (the “fake” ones look to have a pure white background) which is why you get perfect accuracy. However this doesn’t really explain why the accuracy would drop, and I’m still surprised by that. I can’t find the code showing this in the notebook though.

I’m just guessing. I happened to be playing around with lesson 2 notebook as well, and I noticed learn.predict(img) seemed to be performing badly. I suspect you need to run ImageDataBunch.single_from_classes again every time you make a new prediction, as a new instance of this “empty” class has to be created for every new prediction.
Try this

def predict_imaged(image):
    def data():
        return ImageDataBunch.single_from_classes(train_path, classes, tfms = get_transforms(), 
                                                  size = 224).normalize(imagenet_stats)
    learn = create_cnn(data(),models.resnet34)
    learn.load("stage-3")
    pred_class, pred_indx, outputs = learn.predict(image)
    return pred_class
1 Like

I figured it out! When you specify classes again (such as in lesson 2) MAKE SURE they are in the EXACT SAME ORDER!!! That was where my mistake was made

1 Like