How to match test predictions with image ids for kaggle submission?

alexchalk · July 24, 2024, 8:53pm

Hi,

I’ve just completed Part I of course.fast.ai, and I’ve entered Petals to the Metal - Flower Classification on TPU | Kaggle to start applying what I’ve learnt.

I’ve trained a model with ~95% accuracy here ml_env/petals_to_the_metal/01.ipynb at 5b40ae7a230b4c2ad3f55914a146212effa9cfd8 · AlexChalk/ml_env · GitHub , but I’m struggling with the submission step.

Kaggle requires me to pair all my predictions with unique ids for each image in a csv file. Ideally, I’d tell fastai to track these ids as metadata, then output them along with its predictions when i call .get_preds, but I can’t see a way to do that.

What’s the most idiomatic way of solving this problem? And can I rely on fast.ai to respect the order of the data passed to test_dl when outputting predictions (i.e. if image #0101 is at data[0] and image #0202 is at data[1] passed to test_dl, will these be the first and second predictions output by get_preds)? Thanks!

alexchalk · July 24, 2024, 9:39pm

If I assume the answer to question 2 is yes, I can do something like this:

output = learn.get_preds(dl=test_dataloader, with_decoded=True)
probs,_,idxs = output
results = pd.Series(idxs, name="label")
ids = pd.Series([datum[2].decode('utf-8') for datum in test_data], name="id")
submission = pd.DataFrame({'id': ids, 'label': results})
submission.to_csv('submission.csv', header=True, index=False)

Does that seem reasonable?

alexchalk · July 25, 2024, 2:40pm

(Tried submitting it and scored ~95.5%, so it seems like fast.ai does respect the data order).