I am testing my trained models to see how accurate they are at predicting the categories of the brand new images (not seen previously). I used the following code, but there are 2 issues with it: a) it shows just one category without % probabilities; b) it is set up to just test one image at a time:
My question is how can I test a bunch of brand new images and get the result with file names and percentage of probabilities for each of my 3 possible categories, i.e. the output should be as follows:
xxx.png - category A 98% category B 1.5% Category C 0.5%
ccc.png - category A 80% category B 10 % Category C 10%
Pred class will only have the class. If we follow the documentation for predictions here we can see that if we do:
img = data.train_ds[0][0]
learn.predict(img)
It will return: (Category 3, tensor(0), tensor([0.5275, 0.4725]))
So as we can see, in your case “outputs” will result in your raw probabilities. If you want multiple, generate a test dataset of your classes and add an .add_test() option for your dataset during databunch creation and run learn.get_preds(DatasetType.Test)
However also, if you add in a third class that was never trained on it will only try for two classes as you were never trained to search for three!
I have 3 classes, and the model was trained on all 3, the test files also have one of these 3 classes assigned to them by observers, I just want to test how good is my model in terms of generalizing on something that belongs to one of these 3 categories, but has not been seen before.
Also, I need this for potential model deployment, so adding test files to train and valid files is not exactly what I need, I think.
Got it! Here is the documentation for test sets, and there are examples in the course-v3 notebooks.
If you are doing for deployment, instead make an ImageList of your set of images you want to train on, and when you use load_learner for deployment, pass in that ImageList in.
ok, so basically to get results on test set, I will have to add test set to my training and validation sets when I am training my model? Something like we did in v 1 when we used to have 3 sets of data in folders? (i.e. train, valid, test?) But in this new version I think we just have all data and tell the model how to separate it, i.e. valid_pct=0.3, for example?
Sorry, I am terribly confused about all this. I need to test a bunch of new files to see how accurate my model is and after that, I need to figure the way so my users could upload a bunch of files and get results back.
Some time ago someone did a dog classification on render (it is gone now https://whatdog.onrender.com/), where you upload the file and get results back, for example:
Result = [(‘german_shepherd’, 0.29597368836402893), (‘border_terrier’, 0.0761934369802475), (‘rottweiler’, 0.04072378948330879)]
If you have a test set you want to examine during training yes. During production (post learn.export()) no, you will not. You can pass in a test set during deployment and the call to load_learner.
We still are in valid_pct as well, we are saying split our dataset into 30% validation. I have an example notebook that goes over what you describe here as well. The main bit of code you want is the following:
data = ImageList.from_folder(path)
learn = load_learner(path, export_file_name, test=data)
y, _ = learn.get_preds(DatasetType.Test)
y = torch.argmax(y, dim=1)
preds = [learn.data.classes[int(x)] for x in y]
A way to do this during training is when you make an image databunch you do .add_test(ImageList.from_folder(folder)) (this is assuming your test data is in one folder)
predictions= calculates something, but I get no results, nothing, empty space. I also tried the following:
y, _ = learn.get_preds(DatasetType.Test)
y = torch.argmax(y, dim=1)
preds = [learn.data.classes[int(x)] for x in y]
and it also returns nothing. I got no errors, just no results at all. What am I doing wrong here?
Another question is that the model was trained on normalized images, and when I predicted a single image, it was also normalized via:
data = ImageList.from_folder(path, classes, size=(100,180)).normalize(imagenet_stats)
but in this case when I get a bunch of images from the test folder, they are not normalized? Would it negatively affect the model performance on these not-normalized test images?