For example, the last layer of the ImageNet
model classifies the input image into 1000 possible classes. For our cat/dog classifier, we start with the pre-trained ImageNet
model, “chop off its head” – i.e remove the final layer – and replace it with a new final layer (or head
) that is a two-way classifier
. Note that we retain the body
– i.e. the layers before the head
– which have the pre-trained weights from ImageNet
.
Does anyone know of any research where some convolutional layers weights were manually initialized (as opposed to randomly initialized via Xavier
or Kaiming
init).
Things like Sobel operators to give the network a headstart on learning useful things (eg. edges) about the natural world.
One possibility that I read about to jump-start a model in a new domain (provided you have enough data) is using its convolution layers as the encoder part in a encoder-decoder configuration.
You can train this encoder-decoder on unlabeled data, so you don’t need ground truth.
That way the convolution layers learn how to extract meaningful features from the (unlabeled) data so that the decoder can reconstruct the input image.
You can then detach the encoder part, attach it to a dense part (if the model needs it) and train with labeled data starting from there.
Makes a lot of sense to me, but it also sounds pretty labor intensive
Thanks! But today everyone in NLU moved to Transformer models and the behavior there could be different. So, I am curious specifically about them.
Non beginner question. Are there any resources for combining models in production? Ie I want to label what a person is doing in a picture combining NLP/CV
Can we visualize fastain models on tensorflow board? https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html
Yes, there is a callback for that.
Some other interesting paper on this topic: http://cips-cl.org/static/anthology/CCL-2019/CCL-19-141.pdf
One possibility that I read about to jump-start a model in a new domain (provided you have enough data) is using its convolution layers as the encoder part in a encoder-decoder configuration.
Self-supervised learning like you described seems to be popular in computer vision, and you don’t need an entire encoder-decoder sequence! For example, you can apply 90, 180, and 270 degree rotations to an image, and then train a convnet to classify the correct rotation. This “pre-text” training seems to be really helpful for jump-starting a convnet (e.g., RotNet).
Jeremy also had a blog post with lots of great pre-text examples!
As we talked about running the first cat-dog example of 01_intro
, I noticed that when I trained the model, the second step of fine-tune
(here updating the whole model) seems to overfit; error-rate
increases, validation loss increases, with training loss dropping significantly. Is that overfitting an oversight, or expected behavior?
I have a requirement to compare between inference metric (output metric such as accuracy,f1score etc) of various models on a certain topic (eg text classification), and pick the best model. How do you do it , has anybody tried using any statistical significance tests for the same ?. thanks
Maybe we should build this into fastai at some point, at least the easy incarnation involving rotating the images.
Generally I’d imagine you compare their results on a held out test set. If you wanted to use a p-test you could, just make sure to do multiple runs with your models if you can (IE 3 or 5 times)
Can you share intuition behind observing metric vs loss on validation set during training? I thought metric like accuracy is much more volatile, especially if validation set is small, so choosing checkpoints based on minimizing validation loss seemed like a good idea to me.
One thing to consider: let’s for simplicity consider a classification setting. The value of a cross-entropy loss depends not only on whether the image is classified correctly, but also on the confidence that the model has in the prediction. So your loss can increase if the model is getting more things wrong, OR if the model is becoming less confident about some predictions.
Intuitively, the second thing might not necessarily be bad: if the model was overconfident for some reason earlier, it’s ok if it becomes less confident now (and so the loss increases) as long as the prediction is still correct. If you think in these terms, you see how you might get a loss that’s increasing and an accuracy that is improving.
For example, the model might be learning now how to classify well some data points that was getting wrong earlier (which would decrease the loss by a certain amount A), and in order to do so it might need to become less confident about other examples that it was already getting right (which would increase the loss by B). If B > A then you will get a net increase in the loss, but also an improved accuracy.
Hey can anyone point me in the right direction with this:
In chapter one of fastbook, there is this statement
The importance of pretrained models is generally not recognized or discussed in most courses, books, or software library features, and is rarely considered in academic papers. As we write this at the start of 2020, things are just starting to change, but it’s likely to take a while.
My interest is in just how things are changing, are there any papers that are tackling this you can point us to, or are there any interesting ideas that you can share with regards to this.
An example of this is ULMFit paper