Validation loss without a validation set

I’m working on the dog breed kaggle competition which has a folder for test, and train, and a csv which lists the training set images and their actual dog breed. I ran the code below with an empty validation folder and got a validation loss. how was this validation loss calculated? thanks

PATH = “data/dogbreed/”
sz=224
arch=resnet34
data = ImageClassifierData.from_csv(PATH, folder=‘train’, csv_fname=f’{PATH}labels2.csv’, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

epoch trn_loss val_loss accuracy
0 2.153056 1.079305 0.768591
1 1.107497 0.700231 0.811644
2 0.788863 0.603783 0.823875

The results seem pretty similar after I randomly moved 20% of the training set to the valid folder.

epoch trn_loss val_loss accuracy
0 2.123359 1.062867 0.770548
1 1.111311 0.699187 0.820939
2 0.810759 0.59895 0.828278

ImageClassifierData.from_csv does not require a valid folder, it takes a split of the CSV. You can control how much with get_cv_idxs function. val_idxs is by default set to 20% of rows of csv for validation set

    def from_csv(cls, path, folder, csv_fname, bs=64, tfms=(None,None),
           val_idxs=None, suffix='', test_name=None, continuous=False, skip_header=True, num_workers=8):

Just to clarify more what @sjdlloyd said. I think you misunderstand the validation set and test set. Validation set is the data that you already have the label and it is taken from your train folder. The interval of validation set is defined by val_idxs.

Test set is the data you want to apply your model on (to submit to Kaggle for example). It doesn’t have the label, that’s why you don’t see the .csv file for testset. Then, it doesn’t matter how you move the data from your training folder to the test folder, I think you will get almost the same val_loss

thank you that makes sense