Overfitting based on comparing training and CV accuracy?

jmrichardson · April 18, 2019, 7:50pm

I have been taking the course: https://github.com/fastai/fastai/blob/master/courses/ml1/lesson1-rf.ipynb

It seems to suggest that high training accuracy and low CV accuracy is a sign of over fitting. However, I am getting conflicting information that suggests that training accuracy is largely useless and you need to compare CV with test set. Also, that RFs “by design” will have almost perfect training score and lower CV scores:

Here’s an article that also states that maximizing the CV accuracy without respect to training accuracy avoids overfitting:

https://jakevdp.github.io/PythonDataScienceHandbook/05.03-hyperparameters-and-model-validation.html

Can someone please explain why Jeremy is comparing training and CV accuracy versus comparing CV and test?

Thanks

rgarcia · May 8, 2019, 6:25pm

What is “CV” for you in this question?