Lesson 11 discussion and wiki

sgugger · April 11, 2019, 3:56am

Again, this is not fastai. This is a developmental tool for teaching. It’s not documented, it’s not a library and it’s not refined yet

ThomM · April 11, 2019, 3:57am

That phenomenon smells fishy (tenchy?) to me. I can’t imagine the intuition about why having more “mislabeled” data is better than if there was a way to guarantee the object of interest is present in the label. I guess the “happy outdoor fisherman ~= tench” factor explains some of it. I’ll google around but does anyone know of any material that explains why noisy labels might actively help? (Not that I don’t believe it, I totally believe it, I just don’t get why).

bholmer · April 11, 2019, 3:57am

Did you look at Nvidia’s DALI library?

devforfu · April 11, 2019, 3:58am

This pytorch-based augmentation is a very interesting topic. I used to apply the OpenCV perspective and affine transformations to do these things. (Which require numpy arrays and are executed on CPU).

r0mer0m · April 11, 2019, 3:59am

If it’s done at batch level why not doing it at epoch level? I suspect that what takes time is iteratively generating random numbers that can be generated all at once.

JoshVarty · April 11, 2019, 4:00am

I don’t think noisy labels would actively help. But I suspect (Data Augmentation + Chance of Noisy Labels) > (Perfect Labels without Augmentation).

The degree of noisy likely matters here too.

A recent Kaggle competition for audio recognition is dealing with exactly this problem. This paper discusses label noise (not from data augmentation): https://storage.googleapis.com/kaggle-forum-message-attachments/365414/9991/Jeong_COCAI_task2.pdf

sgugger · April 11, 2019, 4:00am

We did, it doesn’t really do the batch transforms we do since they only batch after all the transforms.

KevinB · April 11, 2019, 4:01am

whoa, a glimpse into the future… no more 100% cpu while training or saving an augmented dataset…

radikubwa · April 11, 2019, 4:02am

I’m seeing a @jit decorator while doing augmentation. Isn’t that from numba library?

gamino · April 11, 2019, 4:02am

Great class as always! Looking forward to playing the augmentation code. Thanks!

bholmer · April 11, 2019, 4:02am

BTW, DALI has jpeg decode and image augmentation written in CUDA

sgugger · April 11, 2019, 4:02am

No, it’s from PyTorch.

nswitanek · April 11, 2019, 4:02am

Forces model to try to learn contextual clues associated with tenches?

drscotthawley · April 11, 2019, 4:06am

I’d love to hear more in the future about augmenting for regression problems too.

EDIT: Since the rest of my original post strikes me now as ‘off-topic’ for Lesson 11, I’ve moved it to a separate thread: https://forums.fast.ai/t/data-augmentation-for-regression/43518

nswitanek · April 11, 2019, 4:07am

I’d certainly like to help. Just need to find the time and manage other priorities as always…

bholmer · April 11, 2019, 4:10am

So you are saying that doing the transformations on a batch is more efficient use of the GPU?

bholmer · April 11, 2019, 4:11am

But DALI could still be helpful just for the JPEG decode - or did your experience tell you otherwise?

ksasi · April 11, 2019, 5:47am

t-v · April 11, 2019, 6:32am

Great question! In RNN, you have a single set of weights to change to get all timesteps to do what you want. So what people are doing is to initialize the recurrent weights (typically dubbed something like w_hh in PyTorch) with orthogonal matrices - which preserve the norm of the activations under the linear layers. But that doesn’t consider the effect of nonlinearities and gates. So one thing one might do is to start with orthogonal and find a scaling of the weights that works better than having eigenvalues of 1 for preserving the norm of the activations in an entire RNN timestep.

jcatanza · April 11, 2019, 7:17am

Noisy labels effectively tend to “regularize” the model.