Lesson 2 - Official Topic

sgugger · March 25, 2020, 2:46am

It needs to be the same kind of model, so it won’t work across tasks like CV vs text if they don’t use the same kinds of models.

ilovescience · March 25, 2020, 2:46am

Yes, I see that the Deconvolution , Neuron Deconvolution functionality in Captum implements the same paper Jeremy is discussing now. Would be helpful if there’s a fastai callback for this.

muellerzr · March 25, 2020, 2:47am

This is being looked at by fastai users: Captum model interpretability library

gamino · March 25, 2020, 2:47am

If you use transfer learning for text classification and you trained your model using web data, is there a possibility of copyright infringement by training with such data? Same question for images.

victor.vargas · March 25, 2020, 2:47am

Not a data science here but if we add more layers would it make the model better just because I can do more recognition of the images or would it actually make the model overfit or worse just because it has more layers? I guess what I’m asking is that is there like a good rule of thumb for selecting the architecture resnet50 vs 34?

sgugger · March 25, 2020, 2:48am

That is a complicated question and we don’t know the answer. Neural nets can certainly pick more than you think they would, so you should try to train models on the data you want to publish before making it public and see if you can recover sensitive information or not.

sgugger · March 25, 2020, 2:49am

Great question! It really depends on the data you have. For small datasets, 34 will probably be better, but if you have more, 50 will likely get better results.
The right answer is: try it! It’s not as if it takes a long time to train a model with transfer learning

yfrancois · March 25, 2020, 2:49am

Are there cases where we would use another pre-trained model than imagenet? Like for sound for example?

sgugger · March 25, 2020, 2:50am

We don’t have pretrained models for sound AFAIK, but for texts, pretrained models are everywhere since… last year.

init_27 · March 25, 2020, 2:50am

I believe here’s the post mentioned: Splunk and Tensorflow for Security: Catching the Fraudster with Behavior Biometrics

Edit: I’ll add these to the lecture wiki during break/after the lecture

steef · March 25, 2020, 2:50am

Where in the book are the terms?

JoshVarty · March 25, 2020, 2:51am

Not sure about audio but different domains do have different pretrained models. For example in many natural language processing problems models are pretrained on a dataset called WikiText (a large collection of Wikipedia articles).

rachel · March 25, 2020, 2:51am

Yes, if you have a model pretrained on data more similar to your dataset, you should use that one.

E.g. using a model pretrained on x-rays would be a better starting point if you are doing something with x-rays, compared to ImageNet.

harish3110 · March 25, 2020, 2:51am

Best way to learn! Code experimentation!

jcatanza · March 25, 2020, 2:51am

From Dinesh C.: During fine tuning should we focus solely on metric or should we compare training loss vs validation loss understand undercutting/overfitting ?

ilovescience · March 25, 2020, 2:51am

this chapter

rfhink · March 25, 2020, 2:51am

Are filters independent? By that I mean if filters are pretrained might they become less good in detecting features of previous images when fine tuned?

ilovescience · March 25, 2020, 2:52am

Chapter 1 questionnaire solutions I compiled:

matthewgonz0 · March 25, 2020, 2:52am

I always have trouble understanding the difference between parameters and hyper parameters. If I am feeding an image of a dog as an input, and then changing the hyperparameters of batch sizing in the model. What would be an example of a parameter in this example?

sgugger · March 25, 2020, 2:53am

Yes, they won’t be as good on the general problem they were trained on, because you fine-tuned them to another task.