Good readings 2019

Hi, I believe that some deep learning papers are worth reading regardless of their application domain. But because of the overwhelming amount of deep learning papers published every day, my hope here is to have your help to create a curated list of cool deep learning papers: a kind of list which we might consider as must-reads in 2019. However, saying whether or not a paper is good it is not an easy task. Luckily those customary simple rules of thumb could be useful.

A paper should be linked below if it satisfies at least one of the following desiderata :

As a final note, PLEASE, add a few lines introducing the linked paper and DO NOT COMMENT HERE unless strictly necessary, just use “like this post” so to guide other people readings. If you want to discuss some idea just create a new topic :wink:

Suggested reading lists :

Natural Language : #NLP

Vision: #CV

Category Title / Link Summary
General Bag of Tricks for Image Classification with Convolutional Neural Networks Best practices to follow for Image Classification with CNNs
TBD Group Normalization -
TBD Exploring Neural Networks with Activation Atlases -
TBD Adversarial Examples: Attacks and Defenses for Deep Learning deep neural networks (DNNs) have been recently found vulnerable to well-designed input samples called adversarial examples. In this paper, authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods.

Training and Advanced Topics: #ADV

Ethics of AI: #Ethics (by Nalini)

Category Title / Link Summary
General In Favor of Developing Ethical Best Practices in AI Research Best practices to make ethics a part of your AI/ML work.
General Ethics of algorithms Mapping the debate around ethics of algorithms
General Mechanism Design for AI for Social Good Describes the Mechanism Design for Social Good (MD4SG) research agenda, which involves using insights from algorithms, optimization, and mechanism design to improve access to opportunity
Bias A Framework for Understanding Unintended Consequences of Machine Learning Provides a simple framework to understand the various kinds of bias that may occur in machine learning - going beyond the simplistic notion of dataset bias.
Bias Fairness in representation: quantifying stereotyping as a representational harm Formalizes two notions of representational harm caused by “stereotyping” in machine learning and suggests ways to mitigate them.
Bias Man is to Computer Programmer as Woman is to Homemaker? Paper on debiasing word embeddings.
Accountability Algorithmic Impact Assessments AI Now paper defining the processes for auditing algorithms.

From the abstract : Noah A. Smith presented ideas developed by many researchers over many decades. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are.


I enjoyed this lighthearted paper this week… :slight_smile:

a novel algorithm for generating portmanteaus which utilize word embeddings to identify semantically related words for use in the portmanteau construction.


Bag of Tricks for Image Classification with Convolutional Neural Networks.pdf (538.6 KB)
In this paper, the author studies a series of classification improvements and empirically evaluate their impact on the final model accuracy through ablation study.
Very practical work ! Most of these methods have been implemented in Fastai !


This goes in the same direction but for object detection:

From this thread: Mixup data augmentation


Faster RCNN paper. I find this topic really awesome.

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems (NIPS), pp. 91-99. 2015.


Agreed this is a great paper. One of the concepts mentioned from this paper that I find very interesting which surprisingly still doesn’t seem to get much attention is Knowledge Distillation. There is a great paper on this concept specifically Distilling the Knowledge in a Neural Network

Interestingly Knowledge Distillation also seems to build off of the idea of Label Smoothing, a concept which I believe was first introduced in the Inception v3 paper (again this technique seems to have gone largely overlooked by lots of people despite its effectiveness) which is yet another one of the tricks from the Bag of Tricks paper :slight_smile:


Damn, I knew there should be some tricks :slight_smile:

Is a must read for language models.


I enjoyed the attention paper a lot. These two articles were helpful in wrapping my head around it.


This one was kind of surprising to me and go against my intuition:

Showing that training from random initialization can be just as good as transfer learning for computer vision applications: “These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning’ in computer vision.” (coming from the authors of ResNet and Mask R-CNN)


Note that they’re using a dataset with a lot of labels - for the kind of things many folks are doing here on the forums, often with 100 labels or even less, you won’t make any progress without fine tuning!

If you have over 100,000 labels (as this paper does even in their “limited labels” scenarios) then pre-training may be less important (especially for object detection, where every object has 5 pieces of information attached - 4 coordinates and a classification).


There was also a paper published later as a counter-argument.

Using Pre-Training Can Improve Model Robustness and Uncertainty


No yeah, I fully agree. I read that so I know their experiments weren’t entirely “usual” compared to what we usually train on. There’s no reason not to use transfer learning (even if just for the shorter training time). It still feels like an interesting finding and worth reading.

1 Like

This paper was rejected from ICLR but seems useful as it dramatically advances the baseline for the state of the art of a plain language model. The reviewers rejected the work because the authors didn’t demonstrate any downstream task that used the improved language model.


I remember Jeremy mentioning in one of the twitter messages that “Label smoothing” is added to fast-ai and will be part of PART 2

1 Like

Came across this paper which I am considering to implement on the current ongoing Santander challenege in time permits. looks really cool.

I think this paper is very important because it shows how language models are capable of capturing syntactic properties of sentences and solve various tasks. It has evaluation for CoVe, ELMo, BERT and GPT for tasks that require the model to answer a question whether some part of the sentence is a noun phrase, has one of the POS or dependency tags, is coreferent with another word etc. It shows that language models have the potential to improve the results on these tasks in real setting.

It would also be cool to see how AWD-LSTM fits into this team of language models, because in my experiments on similar tasks it shows some nice results, e.g. here:

Also note that @sgugger has recently added this to fastai. Use the master version if you try it, since it’s being changed regularly.


Thank you for starting this forum library! So useful! May I suggest creating a system to classify papers around broad topics? Say something like: vision, NLP, GANs, ethics/FAT etc. Perhaps just defining the categories in the first post and listing the hashtag to use for each suffices?

I have been bookmarking a bunch of AI ethics papers (yet to read :frowning_face:) and will be happy to share a summary if others on here are interested in the topic.