Using multiple different datasets for training

OmarAmin · November 24, 2017, 4:12pm

Assuming that you’re working on a problem that is “similar” to another one with a larger available dataset
working on detecting distracted driver detection from outside the car although there’s a small sized dataset (compared to imagenet) named “statefarm” that has a lot of pictures from inside the car, in this case we have two options:

1- train on statefarm then fine tune on our new dataset from outside the car, which won’t be different than fine tuning after starting from imagenet weights

2- merge the two datasets together and try to enforce the network to learn the common features through validation set

is the 2nd way worth trying, or it won’t be a good option?

how can statefarm dataset be used if i’m working from outside the car?

pramod.srinivasan · November 24, 2017, 4:56pm

This is a well-studied problem called Transfer Learning. During Option 1, you employ a model trained on ImageNet to detect distracted drivers, you are essentially freezing all the initial layers (which is rich in features of objects from ImageNet), but changing the final layer from the outputs of ImageNet to 10 relevant classes for StateFarm – details here. Thus learning has been transferred keeping GPU training hours minimal.

Merging two datasets (ImageNet and Statefarm) is not something I have come across. IMO, please consider using a model trained on a better dataset (ImageNet encompasses many classes and on a cursory look, I don’t think there is not much overlap between driver behavior and ImageNet classes. But it still makes for a great baseline.

jeremy · November 24, 2017, 4:56pm

I’d definitely try both, and see if either help. It’s so easy to train a model, it’s often best to try each idea you have and see what works!

I’d be interested to hear what you find.

OmarAmin · November 27, 2017, 12:26pm

I meant to merge statefarm with my own dataset, thanks for your reply