Lesson 11 discussion and wiki

I’m seeing a @jit decorator while doing augmentation. Isn’t that from numba library?

Great class as always! Looking forward to playing the augmentation code. Thanks!

BTW, DALI has jpeg decode and image augmentation written in CUDA

No, it’s from PyTorch.

1 Like

Forces model to try to learn contextual clues associated with tenches?

I’d love to hear more in the future about augmenting for regression problems too.

EDIT: Since the rest of my original post strikes me now as ‘off-topic’ for Lesson 11, I’ve moved it to a separate thread: https://forums.fast.ai/t/data-augmentation-for-regression/43518

I’d certainly like to help. Just need to find the time and manage other priorities as always…

So you are saying that doing the transformations on a batch is more efficient use of the GPU?

But DALI could still be helpful just for the JPEG decode - or did your experience tell you otherwise?

Great question! In RNN, you have a single set of weights to change to get all timesteps to do what you want. So what people are doing is to initialize the recurrent weights (typically dubbed something like w_hh in PyTorch) with orthogonal matrices - which preserve the norm of the activations under the linear layers. But that doesn’t consider the effect of nonlinearities and gates. So one thing one might do is to start with orthogonal and find a scaling of the weights that works better than having eigenvalues of 1 for preserving the norm of the activations in an entire RNN timestep.

1 Like

Noisy labels effectively tend to “regularize” the model.

2 Likes

Now after I thought some more I think the LSUV loop is necessary after all!

Its because of the nonlinearity. It causes the effect of the division of the linear layer weights by the STD on the activations means/stds after the nonlinearity to become unpredictable.

It seems reasonable that iterating several times will push the activations stats nearer to our goal, but its not certain for any distribution of data…

1 Like

Yes you’re right. And it just so happens that in that case, one iteration was enough.

One question re: Images / Transforms: I seem to remember Jeremy saying in the 2018 course (fastai 0.7 version), that opencv beats PIL performance wise almost all the time, which is why fastai 0.7 used it instead of PIL. Now today Jeremy was talking again about the importance of performance in transformations etc. so I was wondering what was the reason for the switch from opencv to PIL for fastai v1 (and the course)? (I seem to remember it supports more image formats? But the performance issue would still remain!?)

I found in Part1 when I was trying to classify sports action photos that higher resolution was key to improving accuracy - I ended up using 448 x 448. I’m guessing that as the action could be pretty varied it needed reasonable resolution to capture the differentiating features. Maybe shorts socks, cap sleeves or other things that could identify a sport. Off topic but I see some ‘other’ questions above - and I did end up using an ‘other’ class of basically my random photos (mostly landscapes and city shots) that avoided any non-sports photo with some grass in it being identified as cricket.

When are we going to be taught how to feed rectangular images into CNNs? i cant wait for it

So one thing probably is that PyTorch torchvision just uses PIL and that it is can be a bit less complex to install. Regarding the performance, there are three things to note:

  • Is preprocessing the training bottleneck? Preprocessing happens in the background with PyTorch dataloaders, so unless your model waits for the next batch, preprocessing probably is fast enough already. Homography / rotation does cost CPU cycles, cropping not really.
  • There is a SIMD drop-in PIL replacement that very likely catches up quite a bit.
  • If you really want fast preprocessing, you’d probably look at the GPU. Now that Jeremy has rehabilitated my lazyness of just using nearest neighbour, I should really put up my homography transform CUDA kernel (but it’s really trivial to do, so if you always wanted to implement a custom cuda thing, I can recommend it as a first project). :slight_smile:

Best regards

Thomas

4 Likes

The Mixup papersimilarly smelled tenchy to me - as it seems to produce a scrambled image by “convex combinations of pairs of examples and their labels” - so I guess this is a similar de-regularization.

1 Like