Tabular: validation set percentage

And we cannot do learn.data.test_dl because it’s unlabeled :slight_smile:

But what is the purpose of the .add_test directive?
I can see you use the test dataset afterwards (as you indicate, making it the validation ser), but I don’t see why it is needed when “data” is created…

Because all three are tied together for Learner. This is mostly used for Kaggle competition where we have separate train and test CSV’s. When you finish training you just call learn.get_preds(DataSetType.Test) and it will give you the predictions for the test set.

However for labeled, we don’t add a test. It’s not needed, as shown in the link above.

Ok, I think I get what I am doing… or almost :wink:

As I think is expected, the accuracy of the test set is very similar to the accuracy of the last epoch of the training…

So long as you are switching the dataloaders, yes, that’s what I’ve noticed too :slight_smile: you can also verify this by calling learn.data. The validation set should now be your test set

I was checking some of your GutHub examples :wink:
Is there any one that is a simple tabular analysis sequence (datasets creation, train, test accuracy)?

Or maybe from someone else… I want to check if I could be doing anything else than what I am doing today…

No as I wasn’t sure if it was wanted by anyone. Give me about 15-20 minutes! :slight_smile:

1 Like

I put together a simple tabular one for Credit Card Fraud. I feel it is inbetween Lesson 4 and Rossman.

I am excited to look through some @muellerzr code tonight. I am always looking for better ways to make a validation set.

1 Like

@mindtrinket @Bliss here is a notebook that shows an example of what to do and not to do via the Lesson 4 Tabular Notebook example:

This setup also allows us to run ClassificationInterpretation on the test set to analyze what we were missing and by how much

1 Like

Are df.iloc[start:end] and .split_by_idx(list(range(start, end))) referring to the very same rows? Shouldn’t validation and test use distinct rows?

1 Like

I go into a different way of doing it in the notebook above, but that’s the same that is done in the Tabular example. Everything above it is the train, below is the validation, and in the middle is the test set.

I have the same question with you, have you got something?