Lesson 2 In-Class Discussion

In NLP there’s a paper where they replace Named Entity with other Named Entity to augment the data.

1 Like

@jeremy is the code in ConvLearner include those Conv layer (which we need to code the filter), max-pooling, normalization and so on?

Do you have a link? / title of the paper

1 Like

@jeremy (or anyone who knows) On the topic of learning rates… In this example we’re training in just a few epochs, but I can imagine that we’d sometimes encounter training over hundreds or even thousands of epochs. If we have that situation how often in your experience does this method work? Do we need to do this at a number of points over time in order to compute the different learning rates necessary across the different epochs?

What is the name of the reference paper?

It does. You can refer to this post: precompute=True to see what is deleted and what is added by convlerner.

1 Like

I’d love to know how people use data augmentation for time series? Is it just adding noise or something else?

8 Likes

The cyclic learning rate paper discussed setting a range (min and max) and oscillating between those values. Is there a reason why in this case we only pick 1 value?

I think adding noise would defeat the purpose of increasing our accuracy?

Jeremy will explain this part later.

ok, thanks.

I think it’s: Adversarial Examples for Evaluating Reading Comprehension Systems
I can’t remember if this is exactly the paper.

But the subject was looking for adversarial examples on NLP, hope it helps

2 Likes

@anandsaha alright. Thanks!

How do you do data augmentation with NLP?

4 Likes

It’s because you want the learning rate that has the fastest loss improvement. So if you were to take the derivative of that plot the peak might be a good place, and it would be roughly in the middle. However, you want as high a learning rate as possible because that means that you “move faster” towards the minima. So it’s a trade off between a learning rate that changes fast, and taking bigger steps. I think that’s why Jeremy chooses a point in that curve where the gradient is still good, but the learning rate is still kind of high.

4 Likes

Is the data augmentation is possible for a single channel image?

Yes, you can do it in a back and white image.

1 Like

So preCompute=True means it just loads the weights for previous layers (except for the top) from the stored Resnet model and we are not finetuning it?

1 Like

I tried it in keras, It failed because of the single channel! Do you have any working sample for that?

1 Like

Regarding precompute=True: Is my understanding correct that when this is done,

  • the library will take the non-augmented dataset
  • pass it through the neural network with pretrained weights already loaded
  • note down the values of the activation of each neuron
  • and save it to disk?
4 Likes