Lesson 2 - Official Topic

sgugger · March 25, 2020, 2:37am

Loss and error are often synonyms (except if you men error rate that is 1-accuracy). A metric is something like accuracy, that gives a number on how well your model is doing, and is something you are explicitely optimizing for.

A loss function is something that analyzes how badly your model is doing. It needs to have some requirements (like being smooth), we will see how it’s used to train your model.

jcatanza · March 25, 2020, 2:37am

Question from DLenthousiast: We found out that fine_tune(1) first does a head-only training (body frozen), and then a full network retrain.
Why is this a good thing to do? Why not only the head? Why first head-only -> whole network, and not 1 epoch whole network -> 1 epoch head only?

muellerzr · March 25, 2020, 2:37am

cnn_learner has a pretrained argument that defaults to True.

bibsian · March 25, 2020, 2:37am

Do people average parameters from the same architecture but trained on k-folded train/test sets? If we did k-fold cross validation and distributed the computations across a few machines; can we then average the models and have it be like a bagged deep learning model?

yfrancois · March 25, 2020, 2:38am

Is it useful to do different steps of transfer learning? For example, start with imagenet, then a big dataset of data related to our problem, and finally fine-tune with our dataset of interest?

sgugger · March 25, 2020, 2:38am

It has a (big) part that has been pretrained we call the body of the model. It also as something random that is specific to your problem that we call the head.

pinaki · March 25, 2020, 2:38am

Transfer learning existed before Deep Learning – like in Classical ML ? Does anything in the architecture in Neural Net make it more suitable to do Transfer Learning ?

sgugger · March 25, 2020, 2:38am

It probably wouldn’t hurt, as long as you have the data.

Albertotono · March 25, 2020, 2:39am

How can we check and visualize the features extrapolated, like in Matt Zeiler paper?
Is it possible to have a 3D visualization of the space with the images that we can navigate in, like in TensorFlow board?

image1851×982 258 KB

Raymond-Wu · March 25, 2020, 2:39am

Here’s what I had in my notes previously with the update that Jeremy gave. I think it covers a couple people’s questions. In previous lessons he did mention fitting being related to training/validation loss but now that isn’t as good as metrics

sgugger · March 25, 2020, 2:39am

Classical ML can’t use pretrained models. The way deep learning models are built makes it easier to use transfer learning, because they are… well… deep. So the “deep” part can more easily be pretrained.

matdmiller · March 25, 2020, 2:41am

It’s when validation metric such as error rate gets worse across epochs.

This is something you’ll get a better feel for the more models you train. Experimentation helps you to get a feel for this.

giacomov · March 25, 2020, 2:41am

That’s not easy to do afaik. Due to the random initialization of the network as well as all the other randomness involved (data augmentation, batch composition…), you are not at all guaranteed that the weights will be corresponding one-to-one between the different instances of the same architecture. So instead of averaging the weights, you can literally use every model you trained in an ensemble, for example using bagging (“majority vote”). I remember reading a few papers using this technique. However, in practice it is a very expensive technique to use: multiple training runs and also multiple networks to run inference on.

pinaki · March 25, 2020, 2:43am

Re Transfer Learning – using results from one task to another – can you actually use an architecture from a different domain (say text / language modeling) and use it for another – say. image classification / object detection ? Intuition being if you know how objects are related in text corpus (through language modeling) – you can be better in object detection too ? Or is this too wild ?

Raymond-Wu · March 25, 2020, 2:43am

How did they make those layer 1 visualizations? Is there a library to visualize how your model is training in that sense?

ilovescience · March 25, 2020, 2:44am

There may be a tool in Captum?

harikrishnanrajeev · March 25, 2020, 2:45am

suppose we have some private data (containing customer details), once we build a model using this data and plan to do transfer learning. Should we consider the learned weights as private ?. Will there be a possibility for the data being exposed if the weights learned on private data is used in public domain ?.

quantum · March 25, 2020, 2:45am

When we look at what is being recognized at these low layers, we’re human beings doing the recognizing. Is it not true that often we won’t know what the machine is actually ‘recognizing’? Are we not just kind of cherry-picking those layers that we can recognize?

radikubwa · March 25, 2020, 2:45am

I think class activation maps could answer this question. Which I think they will cover later since I can see it in the book.

nareshr8 · March 25, 2020, 2:46am

From what I saw, Captum provides a cool UI which says which part of the image are used to classify, say a cat image as cat instead of dog… Would love to integrate it with Fast AI