Lesson 2 In-Class Discussion

ezequiel · November 7, 2017, 2:55am

In NLP there’s a paper where they replace Named Entity with other Named Entity to augment the data.

chingjunehao · November 7, 2017, 2:56am

@jeremy is the code in ConvLearner include those Conv layer (which we need to code the filter), max-pooling, normalization and so on?

jenna · November 7, 2017, 2:57am

Do you have a link? / title of the paper

Even · November 7, 2017, 2:57am

@jeremy (or anyone who knows) On the topic of learning rates… In this example we’re training in just a few epochs, but I can imagine that we’d sometimes encounter training over hundreds or even thousands of epochs. If we have that situation how often in your experience does this method work? Do we need to do this at a number of points over time in order to compute the different learning rates necessary across the different epochs?

setuc · November 7, 2017, 2:57am

What is the name of the reference paper?

anandsaha · November 7, 2017, 2:59am

It does. You can refer to this post: precompute=True to see what is deleted and what is added by convlerner.

–

johnnyv · November 7, 2017, 2:59am

I’d love to know how people use data augmentation for time series? Is it just adding noise or something else?

kmatsuda · November 7, 2017, 3:00am

The cyclic learning rate paper discussed setting a range (min and max) and oscillating between those values. Is there a reason why in this case we only pick 1 value?

init_27 · November 7, 2017, 3:00am

I think adding noise would defeat the purpose of increasing our accuracy?

yinterian · November 7, 2017, 3:00am

Jeremy will explain this part later.

kmatsuda · November 7, 2017, 3:01am

ok, thanks.

ezequiel · November 7, 2017, 3:01am

I think it’s: Adversarial Examples for Evaluating Reading Comprehension Systems
I can’t remember if this is exactly the paper.

But the subject was looking for adversarial examples on NLP, hope it helps

chingjunehao · November 7, 2017, 3:01am

@anandsaha alright. Thanks!

zaoyang · November 7, 2017, 3:02am

How do you do data augmentation with NLP?

johnnyv · November 7, 2017, 3:02am

It’s because you want the learning rate that has the fastest loss improvement. So if you were to take the derivative of that plot the peak might be a good place, and it would be roughly in the middle. However, you want as high a learning rate as possible because that means that you “move faster” towards the minima. So it’s a trade off between a learning rate that changes fast, and taking bigger steps. I think that’s why Jeremy chooses a point in that curve where the gradient is still good, but the learning rate is still kind of high.

santhanam · November 7, 2017, 3:03am

Is the data augmentation is possible for a single channel image?

yinterian · November 7, 2017, 3:06am

Yes, you can do it in a back and white image.

thunderingtyphoons · November 7, 2017, 3:07am

So preCompute=True means it just loads the weights for previous layers (except for the top) from the stored Resnet model and we are not finetuning it?

santhanam · November 7, 2017, 3:07am

I tried it in keras, It failed because of the single channel! Do you have any working sample for that?

anandsaha · November 7, 2017, 3:07am

Regarding precompute=True: Is my understanding correct that when this is done,

the library will take the non-augmented dataset
pass it through the neural network with pretrained weights already loaded
note down the values of the activation of each neuron
and save it to disk?