Lesson 12 (2019) discussion and wiki

Why do we need torch.cuda.syncronize()? Is it kind of lock to synchronize CUDA threads or something?

2 Likes

For timing, it’s best because some of the operations are asynchronous.

2 Likes

Ahh. Thanks!!

Maybe we can use LLVM to convert python in c++ instead of pytorch jit…

…joking :wink:

3 Likes

why would you drop weights or embeddings? What are the advantages?

2 Likes

Try without and see :wink:

1 Like

It’s another form of regularization. For the impact it has, you can look at the ablation table in the paper.

1 Like

I should know better to try and then ask :slight_smile:

1 Like

don’t we have to implement gradient clipping before we can use the pytorch version?

Eh eh, he should have but it’s late :wink:

2 Likes

did you do any gradual unfreezing of layers when you fit the imdb data using the wiki model?

1 Like

Yes, it is in the notebooks.

1 Like

whats the best way to start learning swift ?

They just answered that question. Here is a swift tour online but if you have an iPad or a Mac, download it in Playgrounds to read it interactively.

2 Likes

Swift on older mac systems is not feasible ??

Feel better soon Jeremy and Rachel! (And welcome Chris!)

4 Likes

Thank you fastai team for this lesson! Jeremy, we saw your health made it harder than usual for you so thank you for pushing through it!

5 Likes

Great class as always! Thank you for all that are making this possible.

2 Likes

Couldn’t agree more, the ability to integrate machine learning with applications seamlessly is one of the biggest opportunities of S4TF and the reason I am interested in learning Swift/S4TF

2 Likes

Want to know about Mixup look here:https://www.inference.vc/mixup-data-dependent-data-augmentation/
From Link:
We take pairs of datapoints (x1,y1)(x1,y1) and (x2,y2)(x2,y2), then choose a random mixing proportion λλ from a Beta distribution, and create an artificial training example (λx1+(1−λ)x2,λy1+(1−λ)y2)(λx1+(1−λ)x2,λy1+(1−λ)y2). We train the network by minimizing the loss on mixed-up datapoints . This is all.

This is better Mixup data augmentation

4 Likes