Lesson 11 Discussion

thunderingtyphoons · March 24, 2017, 3:11am

I am reading the paper discussed in class
Training Deep Neural Networks on Noisy Labels with Bootstrapping
https://arxiv.org/abs/1412.6596

I am having trouble understanding equation 6. What is L_soft? Is it the loss? The paper says regression targets. If it is a regression target, then shouldn’t it change for each class (one hot encoded)?

Can someone explain how it is computed?

jeremy · March 24, 2017, 2:32pm

Sure. Glad you’re checking out that paper! The notation is introduced all over the place in it, so to find out what things are, you’ll need to search around a bit!

So here’s the equation:

q are the predicted probabilities. t are the actual labels. And inside the sum() these are being indexed as t[k] and q[k]. If you replace the bit I highlighted with just q[k], then you have the standard cross-entropy loss, which we’ve used for nearly all of our classification models (and we have an XL spreadsheet showing it).

So we’re simply creating a new function which replaces the label, q[k], with a mix of a bit of the prediction t[k] and a bit of the true label q[k] (using a parameter beta which the paper says they set to 0.8).

So basically this is a lot like pseudo-labeling, except that it’s happening for the labeled data, rather than unlabeled.

aifish · March 25, 2017, 12:11am

A question for the babi-memnn notebook:

In the paper “End-To-End Memory Network”, the query embedding is added to before both Softmax layer:

The same relationship is shown in the diagram:

But in the notebook, the query embeding emb_q is merged only with emb_story, but not with emb_c:

Anybody knows why we are skipping the query embedding before 2nd Softmax here?

BTW, another difference I noticed is that, in the paper “+” (sum) is used before the 2nd Softmax, while in our notebook ‘dot’ is used. It seems that different architectures lead to similar results.

Thanks!

jeremy · March 25, 2017, 1:06am

Hmmm. I wonder if I made a mistake… Did you try changing this? Did it make it better or worse?

I think that might be doing the same thing, since it’s just a vector. I haven’t checked carefully though… Are you sure it’s different here?

thunderingtyphoons · March 25, 2017, 3:10am

So, if I have to implement bootstrapping in Keras, do I have to explicitly relabel examples in each minibatch? Or can I implement a custom loss function to handle it?

aifish · March 25, 2017, 4:36am

I added the emb_q before Softmax like this:

The result from the original model:

The result from the above modified model:

It looks to me that it does not make a big difference. I trained them a few times, sometimes one is a bit better than another, but overall the result is quite similar.

I will try it on two hops later to see whether it makes a difference.

aifish · March 25, 2017, 4:56am

The result for 2 hops:

Original model:

Modified model:

Quite comparable to me as well.

GauravAg · March 26, 2017, 3:34pm

Classes not present in ImageNet: Any insights on how are we able to find images with w2v classes which are not defined in imagenet e.g. net and rod (Lesson 11 video, 7:45m)?

jeremy · March 26, 2017, 11:30pm

OK, so now try changing to the ‘two supporting facts’ dataset, and use multiple hops (you can just uncomment the relevant line at the top of the notebook). That would be interesting, since I had a lot of trouble getting that to fit.

aifish · March 27, 2017, 5:18am

Well, in two facts multi-hop case the modified model performed a lot worse. The original model is actually not too bad:

Here is the modified result:

topbots · April 12, 2017, 6:53am

Migrated old homework code off of AWS instance and to my own deep learning server which I built this week (yay!) but running into new issues getting the code the run that didn’t happen before. Would appreciate some ideas / help debugging.

With DCGAN.ipynb, getting this error:

With wgan-pytorch.ipynb getting this error:

Ideas on how to fix?

jeremy · April 12, 2017, 4:26pm

Your python is too old - needs 3.6

rteja1113 · August 17, 2017, 4:41pm

Hi @thunderingtyphoons,
can a soft_loss look something like this in keras ?

EricPB · August 17, 2017, 5:42pm

Note: the complete collection of Part 2 video timelines is available in a single thread for keyword search.
Part 2: complete collection of video timelines

Lesson 11 video timeline:

00:00:30 Tips on using notebooks and reading research papers

https://youtu.be/bZmJvmxfH6I?t=30s

00:03:15 Follow-up on lesson 10 and more word-to-image searches

https://youtu.be/bZmJvmxfH6I?t=3m15s

00:07:30 Linear algebra cheat sheet for deep learning (student’s post on Medium)
& Zero-Shot Learning by Convex Combination of Semantinc Embeddings (arXiv)

https://youtu.be/bZmJvmxfH6I?t=7m30s

00:10:00 Systematic evaluation of CNN advances on ImageNet (arXiv)
ELU better than RELU, learning rate annealing, different color transformations,
Max pooling vs Average pooling, learning rate & batch size, design patterns.

https://youtu.be/bZmJvmxfH6I?t=10m

00:27:15 Data Science Bowl 2017 (Cancer Diagnosis) on Kaggle

https://youtu.be/bZmJvmxfH6I?t=27m15s

00:36:30 DSB 2017: full preprocessing tutorial, + others.

https://youtu.be/bZmJvmxfH6I?t=36m30s

00:48:30 A non-deep-learning approach to find lung nodules (research)

https://youtu.be/bZmJvmxfH6I?t=48m30s

00:53:00 Clustering (and why Jeremy wasn’t a fan before)

https://youtu.be/bZmJvmxfH6I?t=53m

01:08:00 Using Pytorch with GPU for ‘meanshift’ (clustering cont.)

https://youtu.be/bZmJvmxfH6I?t=1h8m

01:22:15 Candidate Generation and LUNA 16 (Kaggle)

https://youtu.be/bZmJvmxfH6I?t=1h22m15s

01:26:30 Accelerating K-Means on GPU via CUDA (research)

https://youtu.be/bZmJvmxfH6I?t=1h26m30s

01:27:15 ChatBots ! (long section)
Starting with “memory networks” at Facebook (research)

https://youtu.be/bZmJvmxfH6I?t=1h27m15s

01:57:30 Recurrent Entity Networks: an exciting area of research in Memory Networks

https://youtu.be/bZmJvmxfH6I?t=1h57m30s

01:58:45 Concept of “Attention” and “Attentional Models”

https://youtu.be/bZmJvmxfH6I?t=1h58m45s

VLavorini · August 19, 2017, 8:19pm

Hello!
I played around to improve the mean shift algorithm execution speed. Here the link:

cheers!

jasonpmorrison · October 1, 2017, 2:18am

For the paper “Training Deep Neural Networks on Noisy Labels with Bootstrapping” there appears to be an implementation in the tensorflow/models repo that was contributed along with the “object_detection” model in tensorflow/models#1561.

Here’s a link to the loss implementation class BootstrappedSigmoidClassificationLoss, including citation!

rasen58 · April 3, 2018, 7:03pm

Thanks for linking that code! Do you understand why they perform a sigmoid on the prediction_tensor?

tschoy · June 29, 2018, 11:13am

Is there any keras or pytorch implementation of this paper available to the public?