Reporting test result for cross-validation with Neural Network

I have a small dataset, so I have to use cross validation to report the test result to get a better estimate of the classification result. For some reason, I have to use neural networks to do this.

Because neural networks have their unique quirks e.g finding hyper-parameters, I am using a nested cross-validation. I am dividing up my dataset into 10 folds for cross-validation. Then I am dividing the 9 folds that are for training, again in 10 folds. From those 10 folds, I am using 9 folds to train with different hyper-parameters(in my case it is number of hidden units and dropout rate), and using the other fold to to get the accuracy with different hyper-parameters (kind of like a validation set in the deep learning literature).

Then I am training my model again on all of the 9 folds of the first division of data with the best hyper-parameters I found. Because I am missing out on some data from the initial 9 folds for using as a validation set. Now when I am reporting the test result, I set the epoch number for training on my training data a fixed number of times, and when my network is doing the best on the test set, I stopped the training, saved that model for future use, and report that result. My question is about this last part. Am I doing something wrong on reporting this result? Just to make it clear, I am not tuning any hyper-parameters at this stage. I am just setting the network to stop training on the training data when it reaches the best test result. I think this is a really subtle problem, if it is a problem at all. That is why I am confused.

I am doing this whole thing 5-6 times with different seeds for different divisions of data, and I am only reporting the mean of all of these runs.

I am facing the same issue in my dissertation. There are two things we should bear in mind; 1- learning model, 2- hyperparameters. What you can do is just simple 10-fold cross-validation with different hyperparameters for one model and report the average accuracy or run t-test. Then you can do the same for another model. In the end, you can compare their results with accuracy/t-test.

This paper will definitely help you with this issue.
Refaeilzadeh P., Tang L., Liu H. (2016) Cross-Validation. In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems. Springer, New York, NY.
I hope that helps.


1 Like