No test dataset

I built a multi class model to predict biological images. All worked fine. I split my data into 80% training and 20% validation ( kept them in labelled folders) and built the model. Got good accuracy, confusion matrix, everything.
I am confused little bit, wont I need test dataset to get the acutal accuracy and confusion matrix. Or just a validation dataset is enough. I understand the model has seen the validation set during training, but at the end of each epoch, it adjusts its hyper parameter based on the accuracy on validation data set. So, in that way, model is looking at the validation dataset, right?

I need some clarification if using and reporting the metrics on validation set is good enough and why?

Hi, the model uses validation set just for the metrics, not for the training, so it is safe to assume that your metrics are on data not seen during training.

Test set very often doesn’t have labels at all, it is used for example in competitions, where you apply the model on that set, create predictions and submit them.

1 Like

Thanks for explaination. This leads to another question. I read validation set is used to fine tune the hyperparameters. How does it do that? As in it runs the prediction on valdiation set and does it take in account what is the error loss in validation set, and accordingly ajusts the parameters?

I general I would use the validation set to adjust the model, i.e. how much weight decay, how many epochs, how much Dropout and of course changing model architecture… Seeing the results on validation set you can check how much your model generalize, and you can fine tune it to achieve better results.

1 Like