I read through Stanford course’s Transfer learning page(http://cs231n.github.io/transfer-learning/) and have a few questions:
- Seems like the recommendation of retraining layers is based on size of the new dataset and similarity to the dataset, on which the pre-trained model is built on. I am curious, what is considered a big dataset here?
- It is mentioned that FCS can also be made image size agnostic? Did not quite follow how.
- Given that most state of the art results come from pre-trained networks. Seems like it is extremely important to keep up with latest pretrained networks trained on different kinds of data. Whats the best way to follow this?
It’s easy to train on random image sizes with a batch size of 1. Input shape will be (None, None, 3) for RGB.
Just use a fully convolutional network and end it with a global average pooling layer to keep your output shape constant. (You can then add additional layers.)
Thanks for your response David!
But a FCN is not a Fully connected layer. By FC, I mean a Dense layer in keras.
Sorry, thought you meant fully convolutional!
As far as I know number of inputs and outputs need to be set for a dense layer (since you’re training n_input * n_output weights).
Here’s my attempt at answering:
A bigger dataset would include many previously unseen classes. The similarity of the newdataset to the dataset that the pre-trained model was trained on is important because: imagine applying a cats vs dogs model to a dataset of fish, this would require extensive retraining.
My confidence in this answer is weak. However, I believe the “spacial size” that they are referring to is about the scale of the items in the images themselves, not the image dimensions. If they are referring to the image dimensions then I’m not sure what they mean and would be interested to hear Jeremy’s answer.
For “academic” datasets such as MNIST, CIFAR-10, ImageNet, etc. there are competitions which post the latest winning models. If you follow these competitions/competitors you should be able to keep up, keep in mind though that these models aren’t VGG and can be quite domain specific. I don’t see much weight sharing from industry, however my perspective is quite weak compared to Jeremy’s.
Hope this helps.