Load test labels from CSV

alx · July 7, 2019, 9:11pm

Hi, I’m using TTA for testing:
log_preds,y = learn.TTA(ds_type=DatasetType.Test)

and if I do
print(y.tolist())

Its simply a list of 0.

I have the true labels (y) for the test set in a csv and I was wondering how I can feed that to TTA.

This is what I’m currently using

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='val', test='test', size=sz,bs=bs)

muellerzr · July 7, 2019, 9:24pm

You would want to create a new databunch that just has either a training or validation set with the labels, and gather the predictions on them.

github.com

muellerzr/FastAI-Test-Set-Generation/blob/master/Labeled_Test_Set.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "lesson4-tabular.ipynb",
      "version": "0.3.2",
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PcZh_7tRk7ke",

This file has been truncated. show original

This repo shows an example with tabular but for images it would be the same.

Essentially this would be your code structure:

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='test', size=sz,bs=bs)

learn.data.valid_dl = data.valid_dl

log_preds, y = learn.TTA(ds_type=DatasetType.Valid)

This should do what you want to accomplish. Let me know if you have issues.

alx · July 7, 2019, 10:56pm

Thanks for you reply!
In the tabular example they don’t seem to import a csv. My ground truths are in a csv form.

When I run:
data = ImageDataBunch.from_folder(’./tmp’,ds_tfms = tfms, valid=‘test’, size=sz,bs=bs)
learn.data.valid_dl = data.valid_dl
log_preds, y = learn.TTA(ds_type=DatasetType.Valid)

I get:
IndexError: index 0 is out of bounds for axis 0 with size 0

I thought .valid was used for the validation and not the test phase.

The other thing I tried was to split my image in their respective class folder (0,1,2,3,4).

Which also doesn’t seem to work

muellerzr · July 7, 2019, 11:04pm

That’s just one example. It will work with any of the datablock API. If you’re labeled through the CSV, why not just use the .from_csv functionality?

https://docs.fast.ai/vision.data.html#ImageDataBunch.from_csv

And think of the validation set as a test set we use to grade our model. Fast.AI will always (atleast right now) have an unlabeled test set. So we can get around this by overloading the validation set with our own test set, then we can call get_preds(), validate(), etc as we would. But unless we store that validation dataloader, we have lost it.

alx · July 8, 2019, 1:05am

How would you write it?

I tried the following and didn’t work:
test_data = ImageDataBunch.from_csv(path="./tmp/test/", header="infer")
data.add_test(test_data)

Would you add it to:

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='val', test='test', size=sz,bs=bs)

or add a new data = after the training?

muellerzr · July 8, 2019, 1:15am

Let’s break it down. So instead of ImageDataBunch, let’s take a step back and go from an ImageList.from_df.

So we’ll start by using pandas to read our dataframe, df = pd.read_csv('mydf').

Now all a databunch is really doing is a series of steps:

Open our data
Split it
Label it
Separate them into batches and wrap into a databunch object.

So we can do the following:

il = ImageList.from_df(df, path, cols, folder).split_none()

Where path is our path to the folder, cols is what column our filename is located at with folder in front of them. We also choose to split_none() as we want it to just be one set.

ll = il.label_from_df(cols='mylabel')

Where ‘mylabel’ is whatever name your column has for it.

Finally we can do:

data = ll.databunch()

And now to get those predictions and accuracy, we go to our Learner we already created originally and have already trained:

learn.data.valid_dl = data.train_dl
learn.validate() or learn.get_preds()

and we can run either or of validate or get_preds on the validation set.

You can call either one of the items we generated above to ensure that everything is functioning properly in our databunch creation, make sure everything is labeled properly, etc.

Let me know any more questions

alx · July 8, 2019, 3:13am

Thanks for all the help!

If I don’t split my test folder into label sub folders (0,1,2,3,4) I get:
FileNotFoundError: [Errno 2] No such file or directory: './tmp/test/0'

However, if I do, I get:
IsADirectoryError: [Errno 21] Is a directory: './tmp/test/0'

Here’s my code:
il = ImageList.from_df(df, '', cols='label',folder='tmp/test').split_none()

muellerzr · July 8, 2019, 3:15am

How is your CSV document described? (What do the rows and columns look like)

alx · July 8, 2019, 3:18am