Why do we need torch.cuda.syncronize()
? Is it kind of lock to synchronize CUDA threads or something?
For timing, itās best because some of the operations are asynchronous.
Ahh. Thanks!!
Maybe we can use LLVM to convert python in c++ instead of pytorch jitā¦
ā¦joking
why would you drop weights or embeddings? What are the advantages?
Try without and see
Itās another form of regularization. For the impact it has, you can look at the ablation table in the paper.
I should know better to try and then ask
donāt we have to implement gradient clipping before we can use the pytorch version?
Eh eh, he should have but itās late
did you do any gradual unfreezing of layers when you fit the imdb data using the wiki model?
Yes, it is in the notebooks.
whats the best way to start learning swift ?
They just answered that question. Here is a swift tour online but if you have an iPad or a Mac, download it in Playgrounds to read it interactively.
Swift on older mac systems is not feasible ??
Feel better soon Jeremy and Rachel! (And welcome Chris!)
Thank you fastai team for this lesson! Jeremy, we saw your health made it harder than usual for you so thank you for pushing through it!
Great class as always! Thank you for all that are making this possible.
Couldnāt agree more, the ability to integrate machine learning with applications seamlessly is one of the biggest opportunities of S4TF and the reason I am interested in learning Swift/S4TF
Want to know about Mixup look here:https://www.inference.vc/mixup-data-dependent-data-augmentation/
From Link:
We take pairs of datapoints (x1,y1)(x1,y1) and (x2,y2)(x2,y2), then choose a random mixing proportion Ī»Ī» from a Beta distribution, and create an artificial training example (Ī»x1+(1āĪ»)x2,Ī»y1+(1āĪ»)y2)(Ī»x1+(1āĪ»)x2,Ī»y1+(1āĪ»)y2). We train the network by minimizing the loss on mixed-up datapoints . This is all.
This is better Mixup data augmentation