Validation and test set?

HaoTieu · August 1, 2022, 2:41pm

Do I need to have both validation and test set when my data is small (312 samples).
I’m googling a lot but still can’t find a clear answer.

amalia · August 1, 2022, 3:22pm

Hello! Chapter 1 of the Fastai book is discussing in detail your question in the section " Validation Sets and Test Sets".

Quote from the book from that specific section:

The discipline of the test set helps us keep ourselves intellectually honest. That doesn’t mean we always need a separate test set—if you have very little data, you may need to just have a validation set—but generally it’s best to use one if at all possible.

But I guess using your judgment and some tests will help you decide what is best for this particular project

HaoTieu · August 1, 2022, 3:48pm

I’m trying to impliment the result of a paper I read.
Can you explain why both valid and test set included in optimisation routine and cross validation ?

bencoman · August 1, 2022, 4:46pm

hi @HaoTieu,

I see @amalia has been very generous with her time to answer you, and advise where you will likely find the information you are seeking. I see that you replied only 20 minutes after @amalia and that your chart is not part of Chapter 1 and you don’t indicate you read the chapter, so I will ask you directly… did you read Chapter 1 as requested?

Looking forward to your questions from Chapter 1.
Also please read this.

btw, another really great resource is the 2022 Fastai Course Videos.

HaoTieu · August 1, 2022, 5:12pm

I completed chapter 1. As far as I know, test data is a seperate set for confirm the result of final model, but when you look at the image, test data also included in the dash box. This is what I’m confused.

bencoman · August 1, 2022, 5:58pm

Short answer is: if you train your model too often you can over-fit the validation set, so a separate test set is a guard against that. Jeremy explains this at 45:11 in Lesson 4.