Test dataloader with loss

dangraf · February 24, 2023, 3:59pm

I’m trying load a model/learner and then re-run a verification on a updated dataset.
I would like to get both the predictions and the loss but are unable to do so without re-defining the dataloader.

This article gives an example of how to run predictions on new data:

test_files = [fn for fn in sorted((source/'val').glob('**/*')) if fn.is_file()]

test_dl = learn.dls.test_dl(test_files)

But i’m not able to pass the labels using the solution above.

Another approach is to just change the items in verification-dataloader.

learn.dls.valid.items = df

learn.dls.valid.new(items = df)

But the the solutions above does not work by some reason. Number of predictions are the same length as the original dataloader even if the items attribute contains more items.

Does anyone know how to create a new dataloader with new items using an existing dataloader?

Best regards/

muellerzr · February 24, 2023, 4:11pm

test_dl = learn.dls.test_dl(test_files, with_labels=True)

dangraf · February 24, 2023, 4:32pm

That was quick!
But I still can’t make it work.

How do I give the test-dl the labels?
I’ve tried

learn.dls.test_dl( (files, labels), with_labels=True)

and

learn.dls.test_dl(list(zip(files, labels)) with_labels=True)

But it does not work… it expects a tensor, image, pandas.series etc but not a tuple.

muellerzr · February 24, 2023, 4:50pm

You only pass the input (files). They need to be setup and labelled in the exact same way as your data for training.

dangraf · February 24, 2023, 4:55pm

I created a dataframe containing both filenames, labes and validation flag during training.
But this dataframe is lost (i guess) when saving and loading the model.

So I need to create a dataframe (easy) and set it somehow for the new files with corresponding label that is currently not listed.

That’s what I tried to do in my first post with no success.

I’m still lost, sorry

muellerzr · February 24, 2023, 5:01pm

Sorry. You need to do that, and pass it to learn.get_preds(dl=test_dl, with_loss=True) as it shows in the docs: Learner, Metrics, Callbacks – fastai

dangraf · February 24, 2023, 5:11pm

It seems that there is a miss understanding somewhere about the question Im trying to get answered.

What I don’t know is how to update my existing dataloader(s) with new data to get them the exact same way as during training.

The learner don’t have the labels for the new data. So I need to either give the test_dl new labels together with new files somehow OR create a new dataloader as based on the existing dataloades with new items with both new filenames and new labels.

Is it possible to do that?

muellerzr · February 24, 2023, 5:34pm

You need to pass the files in as though they were labelled in the same way based on the transforms, what I’ve stated multiple times.

If you want to have this occur during training as a new validation dataloader you need to set learn.dls.valid = test_dl

To get labeled data you need to pass in the same setup for inputs your DataBlock received (or however you made the dataloaders), (so if you did get_image_files, it would be a filename for instance) and fastai will apply the pipeline transforms on these inputs based on the validation transforms to create a new dataloader based on them. You then need to pass in with_labels=True to the creation of the test_dl:

test_dl = learn.dls.test_dl(df, with_labels=True)
preds = learn.get_preds(dl=test_dl, with_loss=True)

Please see these two references: Data core – fastai

You shouldn’t need/nor want to normally update the existing dataloaders. What you do is make a new dataloader based on the old ones and how they are setup. DataLoaders when doing learn.export do not retain the data, just the transforms which is how the data is processed

dangraf · February 24, 2023, 9:57pm

Thanks
sorry for the confusion, This line solved it for me.

test_dl = learn.dls.test_dl(df, with_labels=True)

I interpreted your answer as something else… first create a dataloader using the dataframe for the training data and then create the test_dl using only the list of file-paths (as shown in the examples above) or as a pandas Series, stated in the error message.

anyway, thanks for the help