I’m trying to build a classifier to distinguish bicycle and wheelchairs
I started googling some images and built a dataset of around 3400 for bicycles and 2700 for wheelchairs
I’m trying to get the best possible performance given the lack of training data, what I’ve done is as follows:
1- I used Yolo to do detections on the dataset and considered the detections as new images, so I can increase the dataset
2- assigned 2500 images to the training set for both classes and the rest is for validation.
Q1: the validation accuracy isn’t a good measure now as the validation set are not balanced right? what is the substitute?
Q2: no test set required? how can i measure the generalization performance?
3- performed data augmentation on training set only, and left the validation set alone
Q3 the videos say that this step is for preventing overfitting, but for me it’s for increasing data size and preventing overfitting.
Currently, the bottom performance is for AlexNet trained directly on images without data augmentation with classification accuracy of 90%
then i finetuned VGG16 for bike vs wheelchair and the current performance is around 94%
Q4 is finetuning imagenet classifier is enough for this task as i think wheel chairs isn’t one of the classes in imagenet, i thought i might after training the last layer, i should finetune the last conv layers with a very small learning rate.
I need to achieve performance above 98%, what do you suggest?
Thanks in advance