Lesson 12 (2019) discussion and wiki

Why do we need torch.cuda.syncronize()? Is it kind of lock to synchronize CUDA threads or something?

2 Likes

For timing, itā€™s best because some of the operations are asynchronous.

2 Likes

Ahh. Thanks!!

Maybe we can use LLVM to convert python in c++ instead of pytorch jitā€¦

ā€¦joking :wink:

3 Likes

why would you drop weights or embeddings? What are the advantages?

2 Likes

Try without and see :wink:

1 Like

Itā€™s another form of regularization. For the impact it has, you can look at the ablation table in the paper.

1 Like

I should know better to try and then ask :slight_smile:

1 Like

donā€™t we have to implement gradient clipping before we can use the pytorch version?

Eh eh, he should have but itā€™s late :wink:

2 Likes

did you do any gradual unfreezing of layers when you fit the imdb data using the wiki model?

1 Like

Yes, it is in the notebooks.

1 Like

whats the best way to start learning swift ?

They just answered that question. Here is a swift tour online but if you have an iPad or a Mac, download it in Playgrounds to read it interactively.

2 Likes

Swift on older mac systems is not feasible ??

Feel better soon Jeremy and Rachel! (And welcome Chris!)

4 Likes

Thank you fastai team for this lesson! Jeremy, we saw your health made it harder than usual for you so thank you for pushing through it!

5 Likes

Great class as always! Thank you for all that are making this possible.

2 Likes

Couldnā€™t agree more, the ability to integrate machine learning with applications seamlessly is one of the biggest opportunities of S4TF and the reason I am interested in learning Swift/S4TF

2 Likes

Want to know about Mixup look here:https://www.inference.vc/mixup-data-dependent-data-augmentation/
From Link:
We take pairs of datapoints (x1,y1)(x1,y1) and (x2,y2)(x2,y2), then choose a random mixing proportion Ī»Ī» from a Beta distribution, and create an artificial training example (Ī»x1+(1āˆ’Ī»)x2,Ī»y1+(1āˆ’Ī»)y2)(Ī»x1+(1āˆ’Ī»)x2,Ī»y1+(1āˆ’Ī»)y2). We train the network by minimizing the loss on mixed-up datapoints . This is all.

This is better Mixup data augmentation

4 Likes