Lesson 2 In-Class Discussion

I’m thinking it is because you want to be on the safe side, so a little to the left will give you a little slower, but safer learning rate, but you will eventually get there?

Are there augmentations that work well for non-image data? E.g. time series, or text.

9 Likes

Learning rates affect the rate at which optimizers (like SGD, Adam, RMSProp etc.) converge to the bottom of the error surface. And optimizers are agnostic of the architecture of your neural network. That means, they won’t differentiate between a classic NN, CNN or RNN. They only know how to reduce the error using back propagation.

So yes, learning rate finder can be used for all architectures of NN.

6 Likes

In NLP there’s a paper where they replace Named Entity with other Named Entity to augment the data.

1 Like

@jeremy is the code in ConvLearner include those Conv layer (which we need to code the filter), max-pooling, normalization and so on?

Do you have a link? / title of the paper

1 Like

@jeremy (or anyone who knows) On the topic of learning rates… In this example we’re training in just a few epochs, but I can imagine that we’d sometimes encounter training over hundreds or even thousands of epochs. If we have that situation how often in your experience does this method work? Do we need to do this at a number of points over time in order to compute the different learning rates necessary across the different epochs?

What is the name of the reference paper?

It does. You can refer to this post: precompute=True to see what is deleted and what is added by convlerner.

1 Like

I’d love to know how people use data augmentation for time series? Is it just adding noise or something else?

8 Likes

The cyclic learning rate paper discussed setting a range (min and max) and oscillating between those values. Is there a reason why in this case we only pick 1 value?

I think adding noise would defeat the purpose of increasing our accuracy?

Jeremy will explain this part later.

ok, thanks.

I think it’s: Adversarial Examples for Evaluating Reading Comprehension Systems
I can’t remember if this is exactly the paper.

But the subject was looking for adversarial examples on NLP, hope it helps

2 Likes

@anandsaha alright. Thanks!

How do you do data augmentation with NLP?

4 Likes

It’s because you want the learning rate that has the fastest loss improvement. So if you were to take the derivative of that plot the peak might be a good place, and it would be roughly in the middle. However, you want as high a learning rate as possible because that means that you “move faster” towards the minima. So it’s a trade off between a learning rate that changes fast, and taking bigger steps. I think that’s why Jeremy chooses a point in that curve where the gradient is still good, but the learning rate is still kind of high.

4 Likes

Is the data augmentation is possible for a single channel image?

Yes, you can do it in a back and white image.

1 Like