Lesson 7 - Confusion about calculation of the accuracy of the validation set-Should it be 22 or 5?

Hello Experts,

I have a doubt about what the Professor states in his explanation for how big should the size of the validation set be.
He goes on to say that each of the 3 sets(training, validation, test) should have a minimum of 22 samples of each class. Then further, he states the following:

“So one approach to figuring out is my validation set big enough is train your model 5 times with exactly the same hyper parameters each time and look at the validation set accuracy each time and there is a mean and a standard deviation of 5 numbers you could use or a maximum and a minimum you can use. But to save yourself some time, you can figure out straight away that okay, I have a .99 accuracy as to whether I get the cat correct or not correct. So therefore the standard deviation is equal to 0.99 * 0.01 and then I can get the standard error of that”

From the professor’s explanation, i understand that, to calculate the mean/std of the validation set accuracy scores, we need a minimum of 22 observations. However, here, the professor states that just having 5 validation scores is enough to calculate the mean and std of the validation set accuracy.

Could someone kindly clarify here?

Regards,
Kiran Hegde

@kiranh, can you please give a link to the exact place in the video to which you refer? Which video is the quote from, and what is the timestamp of the starting point of the quote?

Your question is a good one.

The short answer is that you can always calculate the standard deviation \sigma of a bunch of data points, but the more data points you have, the less noisy is the result.

We can quantify how noisy the standard deviation estimate is by computing its standard deviation!

For normally distributed data, the standard deviation of the standard deviation is s\equiv\frac{\sigma}{\sqrt{2n}}, where n is the number of data points. So the fractional error in \sigma is \frac{s}{\sigma}= \frac{1}{\sqrt{2n}}

So the fractional error in the estimate of \sigma is 32% with n=5, 15% with n=22, and 10% with n=50.

Hello @jcatanza Sincere apologies for the delayed response. The time stamp in the video is [5:42]

Is the standard error the same as calculating the standard deviation of the standard deviation, as you have stated above?

Regards,
Kiran Hegde

Hello @kiranh,

No, the standard error is the uncertainty in the estimate of the mean of a set of samples.

Formally, standard error is the standard deviation of the estimate of the mean. The more samples you have, the better is your estimate of the mean.

standard error = \frac{\sigma}{\sqrt N}

This is sometimes called “beating down the noise” by averaging. But it’s expensive: to reduce the noise by a factor of 5 requires increasing the number of samples by a factor of 25.

Hi @jcatanza,

Could you please explain if possible how does the professor arrive at standard deviation as 0.99*0.01? It was mentioned in the video that the accuracy of a validation set is like the mean of a distribution but I could not figure out this basic calculation!
Also how is the validation set accuracy considered to be the mean of a distribution? Do we mean to say the accuracy is like saying “on the average” we get x% correct answers? What distribution would a validation set follow (and why) to predict the mean, standard deviation of a bunch of validation set accuracy values?

1 Like

Hi @avishwan thank you for that good question.

So this occurs somewhere around here the in the video. I think part of the confusion is that there’s an error in the formula for the standard deviation of the binomial distribution: the formula in the spreadsheet is for the variance, so you have to take the square root to get the variance.

But let’s back up to square one. The context here is that @Jeremy is explaining how to choose your sample size in order to reach the desired accuracy. Suppose we have a hypothetical data set of N=10000 patients of which we expect 1% to have cancer. Here p = 0.01. Each patient’s cancer status can be considered a Bernouilli trial with p = 0.01. The expected number of patients with cancer is given by the mean of the Binomial distribution N*p = 1000*0.01 = 100. The uncertainty in the expected number of patients with cancer is then the standard deviation of the Binomial distribution \sigma = \sqrt{N*p*(1-p)} = \sqrt{10000*0.01*0.99}\approx 10. This means that the uncertainty in the expected number of patients with cancer in this sample is +/-10, roughly speaking.

Now suppose we ask the following question: how many studies with 10,000 patients do I need in order to be able to estimate the mean number of patients with cancer (out of 10,000) with an accuracy of +/-1 patient?

The error in the mean is given by the standard error, expressed as \sigma_{mean} = \frac{\sigma}{\sqrt{M}}

We want \sigma_{mean} = 1, or \frac{10}{\sqrt{M}} = 1, so the answer is M = 100 studies.

1 Like

Hi @jcatanza

Thanks for the explanation and the use of an example to demonstrate your calculation. That was helpful!

I still have a doubt based on my original question and I apologize if it sounds silly! So my understanding is that standard error as per what the professor mentioned in the video is the :

standard deviation of the mean of “values”/sqrt(number_of_values)

and in this context, the “value” is the validation accuracy of one trained model and the mean would be the mean of the validation accuracies obtained from say 5 such models that are trained.
So say if I have 5 models, m1, m2, m3, m4, m5 with accuracies as v1,v2,v3,v4,v5, then standard error is defined as

standard deviation of 5 validation accuracies / sqrt(5).

In your example, the accuracy is for one model (which is 1% here) and within a model, the mean and standard deviation and standard error is what I see you have calculated.
However, it felt like @jeremy was explaining the calculation of standard error based on multiple such accuracy experiments/models. Did I totally go off track and have I misunderstood the concept or should I be seeing this from a different view?
Are these values calculated from one accuracy calculation as you have shown?
Thanks again!

Yes @avishwan, you have it right. Standard error is defined as the error in the measurement of the mean over an ensemble of experiments. In the example I gave, a single experiment is a sample of 10,000 patients, and I showed that you need to do M\ge100 experiments in order to measure the mean to an accuracy of \pm1 patient.