It needs to be the same kind of model, so it won’t work across tasks like CV vs text if they don’t use the same kinds of models.
Yes, I see that the Deconvolution
, Neuron Deconvolution
functionality in Captum implements the same paper Jeremy is discussing now. Would be helpful if there’s a fastai callback for this.
This is being looked at by fastai users: Captum model interpretability library
If you use transfer learning for text classification and you trained your model using web data, is there a possibility of copyright infringement by training with such data? Same question for images.
Not a data science here but if we add more layers would it make the model better just because I can do more recognition of the images or would it actually make the model overfit or worse just because it has more layers? I guess what I’m asking is that is there like a good rule of thumb for selecting the architecture resnet50 vs 34?
That is a complicated question and we don’t know the answer. Neural nets can certainly pick more than you think they would, so you should try to train models on the data you want to publish before making it public and see if you can recover sensitive information or not.
Great question! It really depends on the data you have. For small datasets, 34 will probably be better, but if you have more, 50 will likely get better results.
The right answer is: try it! It’s not as if it takes a long time to train a model with transfer learning
Are there cases where we would use another pre-trained model than imagenet? Like for sound for example?
We don’t have pretrained models for sound AFAIK, but for texts, pretrained models are everywhere since… last year.
I believe here’s the post mentioned: Splunk and Tensorflow for Security: Catching the Fraudster with Behavior Biometrics
Edit: I’ll add these to the lecture wiki during break/after the lecture
Where in the book are the terms?
Not sure about audio but different domains do have different pretrained models. For example in many natural language processing problems models are pretrained on a dataset called WikiText (a large collection of Wikipedia articles).
Yes, if you have a model pretrained on data more similar to your dataset, you should use that one.
E.g. using a model pretrained on x-rays would be a better starting point if you are doing something with x-rays, compared to ImageNet.
Best way to learn! Code experimentation!
From Dinesh C.: During fine tuning should we focus solely on metric or should we compare training loss vs validation loss understand undercutting/overfitting ?
Are filters independent? By that I mean if filters are pretrained might they become less good in detecting features of previous images when fine tuned?
Chapter 1 questionnaire solutions I compiled:
I always have trouble understanding the difference between parameters and hyper parameters. If I am feeding an image of a dog as an input, and then changing the hyperparameters of batch sizing in the model. What would be an example of a parameter in this example?
Yes, they won’t be as good on the general problem they were trained on, because you fine-tuned them to another task.