Research Paper Recommendations

Regularized Evolution for Image Classifier Architecture Search:

My initial contribution to part 2 , ahem …
I am currently hooked into reinforcement learning , sorry for that.This is the classic Alpha Go paper, it uses convolutional neural network to train known Go games. And then uses that knowledge as prior to train two different network, policy and value network. The job of these two networks is then to self play and improve and ultimately beat human players.
David Silver primary author of this paper and the guy who does lot of important things at Deepmind , gave a nice tutorial of this paper here.
The next iteration of their work introduced AlphaGo Zero - which didn’t require human knowledge as prior to train.It will be another classic paper if you are into reinforcement learning.

1 Like

Going through some random Reddit thread, I’ve stumbled upon the marvellous paper titled
Stopping GAN Violence: Generative Unadversarial Networks. When the first author’s institution is called “Institute of Deep Statistical Harmony”, you know it’ll be a good read :slight_smile:


@mmr have you seen this article: , it might help break a bit with RL, besides it is a good read.

1 Like

Well, I am fond of the research solutions that deepmind and openai often comes up with. And about the efficiency of RL algorithms in solving problems, I know its hard, that’s why I am interested in it. There are still room for improvement in RL.Also personally its my early days with RL, so let’s see.

Thanks for great thread!
Here is one particularly good and new article:

Even though it’s been already discussed in details in Implementing Mask R-CNN I think it’s worth mentioning

1 Like

This has to be the funniest thing I have ever read. Thanks!

Adversarial Patch
Adversarial Logit Pairing

Optimizing Neural Networks with Kronecker-factored Approximate Curvature
K-FAC: Kronecker-Factored Approximate Curvature

1 Like


I did a talk on the Alpha Go series of engines the other day, you might find it interesting:

Starts ~20 minutes in, the audio is rough for the first 5-10 minutes, sorry.



A illustrative fully coded nlp attention example from Harvard NLP group.



By the way, if you post the arxiv abstract link, e.g.
instead of the paper PDF, e.g.

then that makes it easier for people who use Mendeley to store their bibliographic references. (Yes, you can get from one to the other by replacing pdf with abs, but you have to remember that’s how you do the transform.)



Read the YOLOv3 paper today. It was awesome, and definitely wish more papers were like that :slight_smile:

Also just read the paper on FitLaM (by @jeremy and @sebastianruder). It was an awesome read too, and I could barely contain myself as I read it. I hope we’ll see a lot more of that, especially with tasks such as machine-translation.

I did want to mention, though, that I could not find the result of the ablation studies in the paper at here ( This is briefly alluded to at (This is basically 1:57:03 on Lesson-10). I’ve also included a screenshot.

I got particularly interested in that the validation error was 5.63 (w/out pretraining) vs 5.00 (w/ pretraining) for the IMDB task. Does that mean that we only gained .63% reduction in the error with a pretrained model on wikipedia?

I’m probably just missing something. At any rate, great job guys! Also @sebastianruder, if you just need someone to help with your ablation studies, feel free to directly message me. I’ve got a 2 x GTX 1080Ti machine pretty much eating dust at about this time. Furthermore, I’d be very interested to do some more experimental research on the application of the the FitLam (or ULMFit, whatever you’re calling it now :slight_smile: ) on more kinds of NLP tasks. Let me know … email is

Screenshot of the ablation studies I’m referring to:


Ah remember to be very careful of how you interpret errors. You want to look at relative improvement. It’s about a 12% relative improvement. Or alternatively thing about the % change in the number of incorrect classifications (which is the same number - just another way of saying the same thing).

The benefit is greater for smaller datasets - as you can see from the TREC-6 result and from the top left chart in your image.

The paper with the ablation studies in isn’t out yet. Hopefully in 3 weeks or so we’ll be able to make it available. But I tried to make all the key bits of info available in the video.

Colorless green recurrent networks dream hierarchically

Deep painterly harmonization


Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
Blog post:

Basically, they look at creating good data visualizations as a seq2seq problem: a sequence of data comes in, a sequence of code comes out as Vega-Lite declarative visualization code (which then gets turned into a nice graph). Quite nice.


This paper, Universal Sentence Encoder, looks very applicable to Lesson 11:

Yes it is. I know @narvind2003 has talked about benchmarking the TF Hub implementation against our language models. I’d be interested in the results of that.

1 Like

Im still flabbergasted.