Good readings 2019

fabris · February 27, 2019, 12:04pm

Hi, I believe that some deep learning papers are worth reading regardless of their application domain. But because of the overwhelming amount of deep learning papers published every day, my hope here is to have your help to create a curated list of cool deep learning papers: a kind of list which we might consider as must-reads in 2019. However, saying whether or not a paper is good it is not an easy task. Luckily those customary simple rules of thumb could be useful.

A paper should be linked below if it satisfies at least one of the following desiderata :

it has positive reviews on https://openreview.net
got some “best paper” award
it appears on http://www.arxiv-sanity.com/toptwtr?timefilter=month
it is a well-written survey
none of the above but you still think it is a cool/crazy idea that is going to work

As a final note, PLEASE, add a few lines introducing the linked paper and DO NOT COMMENT HERE unless strictly necessary, just use “like this post” so to guide other people readings. If you want to discuss some idea just create a new topic

Suggested reading lists :

Natural Language : #NLP

Category	Title / Link	Summary
General	Contextual Word Representations: A Contextual Introduction	story of the field of natural language processing
General	To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks	relative performance of fine-tuning vs. feature extraction
General	Learning and Evaluating General Linguistic Intelligence	Current NLP models miss several fundamental components to achieve general linguistic intelligence
TBD	Language Models are Unsupervised Multitask Learners	-
TBD	BERT	-
TBD	SEQUENCE CLASSIFICATION WITH HUMAN ATTENTION	-
TBD	What you can cram into a single vector: Probing sentence embeddings for linguistic properties	-
TBD	Dissecting Contextual Word Embeddings: Architecture and Representation	-

Vision: #CV

Category	Title / Link	Summary
General	Bag of Tricks for Image Classification with Convolutional Neural Networks	Best practices to follow for Image Classification with CNNs
TBD	Group Normalization	-
TBD	Exploring Neural Networks with Activation Atlases	-
TBD	Adversarial Examples: Attacks and Defenses for Deep Learning	deep neural networks (DNNs) have been recently found vulnerable to well-designed input samples called adversarial examples. In this paper, authors review recent findings on adversarial examples for DNNs, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods.

Training and Advanced Topics: #ADV

Category	Title / Link	Summary
TBD	Averaging Weights Leads to Wider Optima and Better Generalization	-
TBD	Distilling the Knowledge in a Neural Network	One of the first papers on Knowledge Distillation, a technique for model compression
TBD	Adafactor: Adaptive Learning Rates with Sublinear Memory Cost	-
TBD	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	-
TBD	Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions	-
TBD	Opening the black box of Deep Neural Networks via Information	-
TBD	A Closer Look at Memorization in Deep Networks	-
TBD	Visualizing the Loss Landscape of Neural Nets	A novel method to visualize how network architectures affect the landscape of the loss
TBD	Neural Ordinary Differential Equations	-
TBD	ANTI-Symmetric-RNN: A dynamical system view on RNNs	-

Ethics of AI: #Ethics (by Nalini)

Category	Title / Link	Summary
General	In Favor of Developing Ethical Best Practices in AI Research	Best practices to make ethics a part of your AI/ML work.
General	Ethics of algorithms	Mapping the debate around ethics of algorithms
General	Mechanism Design for AI for Social Good	Describes the Mechanism Design for Social Good (MD4SG) research agenda, which involves using insights from algorithms, optimization, and mechanism design to improve access to opportunity
Bias	A Framework for Understanding Unintended Consequences of Machine Learning	Provides a simple framework to understand the various kinds of bias that may occur in machine learning - going beyond the simplistic notion of dataset bias.
Bias	Fairness in representation: quantifying stereotyping as a representational harm	Formalizes two notions of representational harm caused by “stereotyping” in machine learning and suggests ways to mitigate them.
Bias	Man is to Computer Programmer as Woman is to Homemaker?	Paper on debiasing word embeddings.
Accountability	Algorithmic Impact Assessments	AI Now paper defining the processes for auditing algorithms.

fabris · February 27, 2019, 12:09pm

From the abstract : Noah A. Smith presented ideas developed by many researchers over many decades. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are.

digitalspecialists · February 27, 2019, 12:34pm

I enjoyed this lighthearted paper this week…

a novel algorithm for generating portmanteaus which utilize word embeddings to identify semantically related words for use in the portmanteau construction.

https://www.punchlinedesign.net/pun_generator

charming · February 27, 2019, 12:59pm

Bag of Tricks for Image Classification with Convolutional Neural Networks.pdf (538.6 KB)
In this paper, the author studies a series of classification improvements and empirically evaluate their impact on the final model accuracy through ablation study.
Very practical work ! Most of these methods have been implemented in Fastai !

MicPie · February 27, 2019, 3:02pm

This goes in the same direction but for object detection:

From this thread: Mixup data augmentation

keyurparalkar · February 27, 2019, 3:09pm

Faster RCNN paper. I find this topic really awesome.

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems (NIPS), pp. 91-99. 2015.

jamesrequa · February 27, 2019, 6:16pm

Agreed this is a great paper. One of the concepts mentioned from this paper that I find very interesting which surprisingly still doesn’t seem to get much attention is Knowledge Distillation. There is a great paper on this concept specifically Distilling the Knowledge in a Neural Network

Interestingly Knowledge Distillation also seems to build off of the idea of Label Smoothing, a concept which I believe was first introduced in the Inception v3 paper (again this technique seems to have gone largely overlooked by lots of people despite its effectiveness) which is yet another one of the tricks from the Bag of Tricks paper

sermakarevich · February 27, 2019, 6:26pm

Damn, I knew there should be some tricks

Kaspar · February 27, 2019, 6:58pm

Is a must read for language models.

timlee · February 27, 2019, 9:19pm

I enjoyed the attention paper a lot. These two articles were helpful in wrapping my head around it.

danielhavir · February 28, 2019, 3:50am

This one was kind of surprising to me and go against my intuition:

Showing that training from random initialization can be just as good as transfer learning for computer vision applications: “These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning’ in computer vision.” (coming from the authors of ResNet and Mask R-CNN)

jeremy · February 28, 2019, 6:50am

Note that they’re using a dataset with a lot of labels - for the kind of things many folks are doing here on the fast.ai forums, often with 100 labels or even less, you won’t make any progress without fine tuning!

If you have over 100,000 labels (as this paper does even in their “limited labels” scenarios) then pre-training may be less important (especially for object detection, where every object has 5 pieces of information attached - 4 coordinates and a classification).

jamesrequa · February 28, 2019, 7:14am

There was also a paper published later as a counter-argument.

Using Pre-Training Can Improve Model Robustness and Uncertainty

danielhavir · February 28, 2019, 7:40am

No yeah, I fully agree. I read that so I know their experiments weren’t entirely “usual” compared to what we usually train on. There’s no reason not to use transfer learning (even if just for the shorter training time). It still feels like an interesting finding and worth reading.

paul · February 28, 2019, 8:45am

This paper was rejected from ICLR but seems useful as it dramatically advances the baseline for the state of the art of a plain language model. The reviewers rejected the work because the authors didn’t demonstrate any downstream task that used the improved language model.

harikrishnanrajeev · February 28, 2019, 9:36am

I remember Jeremy mentioning in one of the twitter messages that “Label smoothing” is added to fast-ai and will be part of PART 2

sahilk1610 · February 28, 2019, 1:39pm

Came across this paper which I am considering to implement on the current ongoing Santander challenege in time permits. looks really cool.

noisefield · February 28, 2019, 7:14pm

I think this paper is very important because it shows how language models are capable of capturing syntactic properties of sentences and solve various tasks. It has evaluation for CoVe, ELMo, BERT and GPT for tasks that require the model to answer a question whether some part of the sentence is a noun phrase, has one of the POS or dependency tags, is coreferent with another word etc. It shows that language models have the potential to improve the results on these tasks in real setting.

It would also be cool to see how AWD-LSTM fits into this team of language models, because in my experiments on similar tasks it shows some nice results, e.g. here:

https://www.kaggle.com/mamamot/fastai-awd-lstm-solution-0-71-lb

jeremy · February 28, 2019, 9:03pm

Also note that @sgugger has recently added this to fastai. Use the master version if you try it, since it’s being changed regularly.

nbharatula · March 1, 2019, 5:49pm

Thank you for starting this forum library! So useful! May I suggest creating a system to classify papers around broad topics? Say something like: vision, NLP, GANs, ethics/FAT etc. Perhaps just defining the categories in the first post and listing the hashtag to use for each suffices?

I have been bookmarking a bunch of AI ethics papers (yet to read ) and will be happy to share a summary if others on here are interested in the topic.