Tabular: validation set percentage

Hi again @muellerzr (and all!)
When you define the “data” variable you are defining the training set, the validation set (by splitting) and passing the test set as a separate df. That’s understood.

Then I train and I get something like:

epoch train_loss valid_loss accuracy time
0 0.322909 0.357733 0.835833 00:03
1 0.327918 0.359473 0.834833 00:03
2 0.335293 0.361857 0.832667 00:03

My understanding is that the accuracy is based on using the validation set… so, how is the test set being used in this case? (As originally the whole dataset is labeled, the test set is like a second validation set…)

I hope this question makes sense :wink:
Thanks!

The test set is not used until you are finished training and you want to only evaluate how you are doing. You do it at the very end. Does that make sense? :slight_smile: I’ll also say, the above only have unlabeled test sets. You need to do things differently if we want to grade the test set, instead of just getting predictions.

Yes, I was looking at:

learn.get_preds(ds_type=DatasetType.Test)
[tensor([[0.5895, 0.4105],
         [0.9798, 0.0202],
         [0.7655, 0.2345],
         ...,
         [0.9958, 0.0042],
         [0.5941, 0.4059],
         [0.7750, 0.2250]]), tensor([0, 0, 0,  ..., 0, 0, 0])]

So I could write some code to parse the test dataset, get the categories into 0’s and 1’s and compare with the second list…
I can see the use if this was a kaggle challenge, you had no labels in the test set and you need to submit results.
But in my case I have a single dataset with everything labeled… not sure if I should just forget about the test set and just have bigger train/validation sets and look at the accuracy after training.

Or maybe I am missing the smart way to use the test set

No absolutely! The test set is extremely important. Give me a moment and I’ll give some code on how to do the evaluation. Essentially we label the test Tabular list and we turn the valid_dl into that one.

See my post here:

Thanks a lot one more time. Will look at that post and will be waiting for that code! :slight_smile:

The code is in the post :wink:

Lol. Then I think I missed something. Will ask baby questions first. This is the code (similar as before):

test = TabularList.from_df(df.iloc[-3000:-1].copy(), path=path, cat_names=cat_names, cont_names=cont_names)
data = (TabularList.from_df(df[0:-3001], path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(list(range(1000,7000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test)
                           .databunch())

Why I am adding a test (.add_test) to “data”?
Shouldn’t I be able to do something direct like learn.validate(learn.data.test_dl) ?

Sorry, I meant the code here:

We cannot directly pass a data loader to validate. What will happen is it’ll run the default validation set no matter what. So what I describe in the link above is how to go about overriding that with the labeled test set so we can properly use it.

And we cannot do learn.data.test_dl because it’s unlabeled :slight_smile:

But what is the purpose of the .add_test directive?
I can see you use the test dataset afterwards (as you indicate, making it the validation ser), but I don’t see why it is needed when “data” is created…

Because all three are tied together for Learner. This is mostly used for Kaggle competition where we have separate train and test CSV’s. When you finish training you just call learn.get_preds(DataSetType.Test) and it will give you the predictions for the test set.

However for labeled, we don’t add a test. It’s not needed, as shown in the link above.

Ok, I think I get what I am doing… or almost :wink:

As I think is expected, the accuracy of the test set is very similar to the accuracy of the last epoch of the training…

So long as you are switching the dataloaders, yes, that’s what I’ve noticed too :slight_smile: you can also verify this by calling learn.data. The validation set should now be your test set

I was checking some of your GutHub examples :wink:
Is there any one that is a simple tabular analysis sequence (datasets creation, train, test accuracy)?

Or maybe from someone else… I want to check if I could be doing anything else than what I am doing today…

No as I wasn’t sure if it was wanted by anyone. Give me about 15-20 minutes! :slight_smile:

1 Like

I put together a simple tabular one for Credit Card Fraud. I feel it is inbetween Lesson 4 and Rossman.

I am excited to look through some @muellerzr code tonight. I am always looking for better ways to make a validation set.

1 Like

@mindtrinket @Bliss here is a notebook that shows an example of what to do and not to do via the Lesson 4 Tabular Notebook example:

This setup also allows us to run ClassificationInterpretation on the test set to analyze what we were missing and by how much

1 Like

Are df.iloc[start:end] and .split_by_idx(list(range(start, end))) referring to the very same rows? Shouldn’t validation and test use distinct rows?

1 Like

I go into a different way of doing it in the notebook above, but that’s the same that is done in the Tabular example. Everything above it is the train, below is the validation, and in the middle is the test set.

I have the same question with you, have you got something?