Might be a noob question that was already answered before, but here goes.
I currently am working through chapters 5 and 6 of fastbook and trying the tricks out on a kaggle contest. Since kaggle primarily provides the image names and targets in
.csv files, I used the approach recommended in the book as shown below:
dblock = DataBlock(blocks=(ImageBlock(), CategoryBlock()), get_x=get_x, get_y=get_y, splitter=ColSplitter(), item_tfms=Resize(460), batch_tfms = [Resize(224), Normalize.from_stats(*imagenet_stats)]) dblock.summary(df) dls = dblock.dataloader(train_df)
get_y are defined as below:
def get_x(r): return os.path.join(configs["train_img_dir"], r["Image"]) def get_y(r): return r["Id"]
This works great and I’m able to train and validate the model just fine. Now comes the tricky part. When I want to create a submission, I have to read a csv for the file names
sample_submission.csv and then create predictions accordingly.
Here’s what I’ve tried so far:
When I tried to use
test_dl = dls.test_dl(test_df) preds, _ = learner.get_preds(test_dl)
I got an error that there was no image matching the path and I realized that this was because
get_x was referencing the train directory and not the test.
I tried creating a new dataloader,
test_dlswith a new
get_test_xfunction for getting the correct image paths but that didn’t work
I also tried
dls.test_dl(get_image_files(configs["test_img_dir"])). This worked but I don’t know which file order is being used by
get_predsso I can’t line up the predictions with the submission csv
My current approach is to use
learner.predict(img)for each image in the submission csv and this works but is painfully slow (15-20 minutes for all predictions)
Earlier threads related to this question in this forum show approaches using older versions of FastAI.
I would really appreciate any help or pointers here.