Tabular: validation set percentage

(Zachary Mueller) #21

And we cannot do because it’s unlabeled :slight_smile:



But what is the purpose of the .add_test directive?
I can see you use the test dataset afterwards (as you indicate, making it the validation ser), but I don’t see why it is needed when “data” is created…


(Zachary Mueller) #23

Because all three are tied together for Learner. This is mostly used for Kaggle competition where we have separate train and test CSV’s. When you finish training you just call learn.get_preds(DataSetType.Test) and it will give you the predictions for the test set.

However for labeled, we don’t add a test. It’s not needed, as shown in the link above.



Ok, I think I get what I am doing… or almost :wink:

As I think is expected, the accuracy of the test set is very similar to the accuracy of the last epoch of the training…


(Zachary Mueller) #25

So long as you are switching the dataloaders, yes, that’s what I’ve noticed too :slight_smile: you can also verify this by calling The validation set should now be your test set



I was checking some of your GutHub examples :wink:
Is there any one that is a simple tabular analysis sequence (datasets creation, train, test accuracy)?

Or maybe from someone else… I want to check if I could be doing anything else than what I am doing today…


(Zachary Mueller) #27

No as I wasn’t sure if it was wanted by anyone. Give me about 15-20 minutes! :slight_smile:

1 Like

(James Dietle) #28

I put together a simple tabular one for Credit Card Fraud. I feel it is inbetween Lesson 4 and Rossman.

I am excited to look through some @muellerzr code tonight. I am always looking for better ways to make a validation set.

1 Like

(Zachary Mueller) #29

@mindtrinket @Bliss here is a notebook that shows an example of what to do and not to do via the Lesson 4 Tabular Notebook example:

This setup also allows us to run ClassificationInterpretation on the test set to analyze what we were missing and by how much

1 Like

Test set evaluation - how?
How to load learner and test on separate ImageList from folder?
[Project] Stanford-Cars with fastai v1
(Naoki Peter) #30

Are df.iloc[start:end] and .split_by_idx(list(range(start, end))) referring to the very same rows? Shouldn’t validation and test use distinct rows?

1 Like

(Zachary Mueller) #31

I go into a different way of doing it in the notebook above, but that’s the same that is done in the Tabular example. Everything above it is the train, below is the validation, and in the middle is the test set.



I have the same question with you, have you got something?