Lesson 11 discussion and wiki

Again, this is not fastai. This is a developmental tool for teaching. It’s not documented, it’s not a library and it’s not refined yet :wink:

That phenomenon smells fishy (tenchy?) to me. I can’t imagine the intuition about why having more “mislabeled” data is better than if there was a way to guarantee the object of interest is present in the label. I guess the “happy outdoor fisherman ~= tench” factor explains some of it. I’ll google around but does anyone know of any material that explains why noisy labels might actively help? (Not that I don’t believe it, I totally believe it, I just don’t get why).

3 Likes

Did you look at Nvidia’s DALI library?

This pytorch-based augmentation is a very interesting topic. I used to apply the OpenCV perspective and affine transformations to do these things. (Which require numpy arrays and are executed on CPU).

If it’s done at batch level why not doing it at epoch level? I suspect that what takes time is iteratively generating random numbers that can be generated all at once.

I don’t think noisy labels would actively help. But I suspect (Data Augmentation + Chance of Noisy Labels) > (Perfect Labels without Augmentation).

The degree of noisy likely matters here too.

A recent Kaggle competition for audio recognition is dealing with exactly this problem. This paper discusses label noise (not from data augmentation): https://storage.googleapis.com/kaggle-forum-message-attachments/365414/9991/Jeong_COCAI_task2.pdf

2 Likes

We did, it doesn’t really do the batch transforms we do since they only batch after all the transforms.

whoa, a glimpse into the future… no more 100% cpu while training or saving an augmented dataset…

1 Like

I’m seeing a @jit decorator while doing augmentation. Isn’t that from numba library?

Great class as always! Looking forward to playing the augmentation code. Thanks!

BTW, DALI has jpeg decode and image augmentation written in CUDA

No, it’s from PyTorch.

1 Like

Forces model to try to learn contextual clues associated with tenches?

I’d love to hear more in the future about augmenting for regression problems too.

EDIT: Since the rest of my original post strikes me now as ‘off-topic’ for Lesson 11, I’ve moved it to a separate thread: https://forums.fast.ai/t/data-augmentation-for-regression/43518

I’d certainly like to help. Just need to find the time and manage other priorities as always…

So you are saying that doing the transformations on a batch is more efficient use of the GPU?

But DALI could still be helpful just for the JPEG decode - or did your experience tell you otherwise?

Great question! In RNN, you have a single set of weights to change to get all timesteps to do what you want. So what people are doing is to initialize the recurrent weights (typically dubbed something like w_hh in PyTorch) with orthogonal matrices - which preserve the norm of the activations under the linear layers. But that doesn’t consider the effect of nonlinearities and gates. So one thing one might do is to start with orthogonal and find a scaling of the weights that works better than having eigenvalues of 1 for preserving the norm of the activations in an entire RNN timestep.

1 Like

Noisy labels effectively tend to “regularize” the model.

2 Likes