@lesscomfortable I’m a little confused about how fitting works? suppose I ran train model for 3 epochs first and then 2 epochs . is that equals to running 5 epochs in one go?
#Meeting Minutes 04-01-2020 (Thanks @msivanes for the inputs)
## Notes
New Participants
Walkthrough of notebook on Yelp Reviews to explore how fine tuning helps in handling Out of Vocabulary(OOV) words in language model by @msivanes. Words that do not appear in the wiki text & very specific to our domain are initialized with random weights & they are learned as part of fine tuning. This was based on learning from notebook[1] created by @pabloc. For more discussion see [2].
## Advice
Use a smaller sample of the dataset before diving into full dataset. This allows for faster training & quicker iteration.
## Questions
How to override the 60000 limit on vocab while creating Language Model?
When we freeze a model for fine-tuning, do the layers become untrainable or the layer-groups?
## Off topic
@gagan is trying to create a language model for assamese language (one of the low resource language).
We decided to rotate the presentation of the lessons among us. As you know, our meetings are informal, so this is basically like explaining the material to friends and boosting your presentation skills in a supportive environment For a learner, it is one of the best ways to actively engage with the material and actually learn better by explaining. So choose the lesson that you want to understand better yourself. The lesson’s recap should be short (~15-20 mins) covering main concepts in a simple language.So grab a chance and please write which lesson you would like to present Of course, all newcomers are welcome
I started the lessons a couple days ago. just came across this thread! Would love to join the next session and learn from everyone. Thank you for this initiative!
Thanks for the feedback. If there’s an automated way to do that, let the group know.
It’s much easier for us as community members to create a personal calendar event with the information shared in the wiki to make that happen. @shahnoza is volunteering the time to host this and kind enough to provide zoom for this study group. I try not to ask the host to do more work than needed.
Thanks for the feedback. Currently, @shahnoza does remind us on the group regarding the meetups. However, based on the feedback, I have now added an automated reminder to the slack group. We should be getting a reminder on Fridays. As @msivanes pointed out, it would be easier for the members to create a personal calendar event.
Classifier for pen vs pencil followed by questions. @gagan actually timed it starting from data collection to the inference & time taken is 23 min to demonstrate fastai is really FAST AI . (@gagan++) Colab
Conceptual Framework of Supervised Learning (Gradient, Parameters, Loss, Model, Observations, Targets) by @msivanes for lesson2 - sgd.
Car Classifier along with showing EarlyStopping & SaveBestModels callback during training by @tendo. Colab
Advice
Stacked transfer learning - use fine tuning on smaller 224 images (fine tuned) followed by using actual image size data (fine tuned)
Discussion
Class Imbalance : Is it still valid when we use transfer learning?. It might due to the fine tuning. The best thing to do is try it out as Jeremy said.
num_workers : number of cpu cores to speed up the data grabbing process. If you are get out of memory error, reduce num_workers to a smaller number or reduce the batch size(bs).
@AjayStark
The top post(wiki) has all the information that you need to participate in the study group & in the discussions. Let us know if you face any difficulties with anything specific.
Source: Natural Language Processing with PyTorch by Delip Rao et al.
Predictions: y_hat = model(x) , here we are using own model.
Loss function: loss_func(y_hat, y). In addition to that we are also adding it with w2*wd
Gradients: parameter.sub_(learning_rate * gradient), performing an inplace subtraction on parameters with product(learning_rate, gradient). But since our model has multiple parameters (weights, biases), we are looping through them using PyTorch parameters.
Extras:
Weight Decay:
a) w2: using each parameter, we are calculating the sum of squared weights, w2 , for p in model.parameters(): w2 += (p**2).sum()
b) wd: a constant (1e-5)
multiply w2 and wd & add to regular loss_func
Combined
We are going to calculate the loss for each minibatch by calling update(x,y,lr) on them. losses = [update(x,y,lr) for x,y in data.train_dl]
.item() turns into a python number in order to plot & see them visually.
def update(x, y, learning_rate):
wd = 1e-5
#prediction
y_hat = model(x)
w2 = 0.
#sum of squared weights
for p in model.parameters():
w2 = w2 + (p**2).sum()
# regular loss
loss = loss_func(y_hat, y) + w2*wd
# updates the gradients in the model ie parameters
loss.backward()
# instruct pytorch not to record these actions for the next gradient calculation
with torch.no_grad():
for p in model.parameters():
#gradients
p.sub_(learning_rate * p.grad)
p.grad.zero_()
return loss.item()