Load test labels from CSV

Hi, I’m using TTA for testing:
log_preds,y = learn.TTA(ds_type=DatasetType.Test)

and if I do

Its simply a list of 0.

I have the true labels (y) for the test set in a csv and I was wondering how I can feed that to TTA.

This is what I’m currently using

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='val', test='test', size=sz,bs=bs)

You would want to create a new databunch that just has either a training or validation set with the labels, and gather the predictions on them.

This repo shows an example with tabular but for images it would be the same.

Essentially this would be your code structure:

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='test', size=sz,bs=bs)

learn.data.valid_dl = data.valid_dl

log_preds, y = learn.TTA(ds_type=DatasetType.Valid)

This should do what you want to accomplish. Let me know if you have issues.

Thanks for you reply!
In the tabular example they don’t seem to import a csv. My ground truths are in a csv form.

When I run:
data = ImageDataBunch.from_folder(’./tmp’,ds_tfms = tfms, valid=‘test’, size=sz,bs=bs)
learn.data.valid_dl = data.valid_dl
log_preds, y = learn.TTA(ds_type=DatasetType.Valid)

I get:
IndexError: index 0 is out of bounds for axis 0 with size 0

I thought .valid was used for the validation and not the test phase.

The other thing I tried was to split my image in their respective class folder (0,1,2,3,4).

Which also doesn’t seem to work

That’s just one example. It will work with any of the datablock API. If you’re labeled through the CSV, why not just use the .from_csv functionality?


And think of the validation set as a test set we use to grade our model. Fast.AI will always (atleast right now) have an unlabeled test set. So we can get around this by overloading the validation set with our own test set, then we can call get_preds(), validate(), etc as we would. But unless we store that validation dataloader, we have lost it.

How would you write it?

I tried the following and didn’t work:
test_data = ImageDataBunch.from_csv(path="./tmp/test/", header="infer")

Would you add it to:

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='val', test='test', size=sz,bs=bs)

or add a new data = after the training?

Let’s break it down. So instead of ImageDataBunch, let’s take a step back and go from an ImageList.from_df.

So we’ll start by using pandas to read our dataframe, df = pd.read_csv('mydf').

Now all a databunch is really doing is a series of steps:

  • Open our data
  • Split it
  • Label it
  • Separate them into batches and wrap into a databunch object.

So we can do the following:

il = ImageList.from_df(df, path, cols, folder).split_none()

Where path is our path to the folder, cols is what column our filename is located at with folder in front of them. We also choose to split_none() as we want it to just be one set.

ll = il.label_from_df(cols='mylabel')

Where ‘mylabel’ is whatever name your column has for it.

Finally we can do:

data = ll.databunch()

And now to get those predictions and accuracy, we go to our Learner we already created originally and have already trained:

learn.data.valid_dl = data.train_dl
learn.validate() or learn.get_preds()

and we can run either or of validate or get_preds on the validation set.

You can call either one of the items we generated above to ensure that everything is functioning properly in our databunch creation, make sure everything is labeled properly, etc.

Let me know any more questions :slight_smile:

Thanks for all the help!

If I don’t split my test folder into label sub folders (0,1,2,3,4) I get:
FileNotFoundError: [Errno 2] No such file or directory: './tmp/test/0'

However, if I do, I get:
IsADirectoryError: [Errno 21] Is a directory: './tmp/test/0'

Here’s my code:
il = ImageList.from_df(df, '', cols='label',folder='tmp/test').split_none()

How is your CSV document described? (What do the rows and columns look like)