Why do we need
torch.cuda.syncronize()? Is it kind of lock to synchronize CUDA threads or something?
Why do we need
For timing, it’s best because some of the operations are asynchronous.
Maybe we can use LLVM to convert python in c++ instead of pytorch jit…
why would you drop weights or embeddings? What are the advantages?
Try without and see
It’s another form of regularization. For the impact it has, you can look at the ablation table in the paper.
I should know better to try and then ask
don’t we have to implement gradient clipping before we can use the pytorch version?
Eh eh, he should have but it’s late
did you do any gradual unfreezing of layers when you fit the imdb data using the wiki model?
Yes, it is in the notebooks.
whats the best way to start learning swift ?
They just answered that question. Here is a swift tour online but if you have an iPad or a Mac, download it in Playgrounds to read it interactively.
Swift on older mac systems is not feasible ??
Feel better soon Jeremy and Rachel! (And welcome Chris!)
Thank you fastai team for this lesson! Jeremy, we saw your health made it harder than usual for you so thank you for pushing through it!
Great class as always! Thank you for all that are making this possible.
Couldn’t agree more, the ability to integrate machine learning with applications seamlessly is one of the biggest opportunities of S4TF and the reason I am interested in learning Swift/S4TF
Want to know about Mixup look here:https://www.inference.vc/mixup-data-dependent-data-augmentation/
We take pairs of datapoints (x1,y1)(x1,y1) and (x2,y2)(x2,y2), then choose a random mixing proportion λλ from a Beta distribution, and create an artificial training example (λx1+(1−λ)x2,λy1+(1−λ)y2)(λx1+(1−λ)x2,λy1+(1−λ)y2). We train the network by minimizing the loss on mixed-up datapoints . This is all.
This is better Mixup data augmentation