How can we get predictions for all images in a folder efficiently?
I have trained a model on MNIST and it is working pretty well.
I can use learn.predict to get a prediction on a single image.
I tried looping through the images in the folder and running learn.predict but it was way too slow:
files = !ls "mnist_data/test"
preds = []
for file in tqdm(files):
number, n_th, probs = learn.predict(f"mnist_data/test/{file}")
preds.append(n_th)
So instead I decided to gather the images into a numpy array:
files = !ls "mnist_data/test"
imgs = []
for file in tqdm(files):
with Image.open(f"mnist_data/test/{file}") as img:
imgs.append(np.array(img))
imgs = np.array(imgs) # shape: (28000, 28, 28)
The next step would be to make batches from imgs and get predictions for them. But I am not sure how to get the predictions for a batch (using the same stats transforms etc…).
learn.predict is definitely not the right function for this according to the docs.
The problem with this is that it takes more than a day to run on my data. While the way we are supposed to do it (as @muellerzr pointed it out) takes less than a minute.
learn.validate() outputs a list with two elements. I have not yet understood what those two numbers mean or when to use the function.
Can anyone explain this in more detail?
The only issue for me with learn.get_preds(dl=test_dl) is that it crops image to same format as train and valid set which is square. No big deal for classification but not good for segmentation where you want your full image size as output from prediction.
The output of the preds is the tensor of the probabilities, is there a built in fuction that converts to classes? or it needs to be implemented by hands? because the learn.predict to given sample gives the class predicted, but using learn.get_preds on test set doesn’t, this is strange, something is missing?