I have a dataset of 100k images which I aim to classify into 120 categories. Andrew Ng provides some useful advice here: Splitting into train, dev and test sets
The size of the dev and test set should be big enough for the dev and test results to be representative of the performance of the model. If the dev set has 100 examples, the dev accuracy can vary a lot depending on the chosen dev set. For bigger datasets (>1M examples), the dev and test set can have around 10,000 examples each for instance (only 1% of the total data).
There are a couple options I am considering:
(train / dev / test)
(A) 90k / 5k / 5k
(B) 80k / 10k / 10k
Which do you think makes more sense, and why? In general, is there a way to check if your validation set is too small and to make it larger later on? (say, if there is too much variance in your accuracy when using the same model)