Research Paper Recommendations


(sadie) #1

Jeremy recommended we start reading papers. I’d love if people would post links to interesting papers they have found.

I’ll start:
Gender Shades: Intersectional Accuracy Disparities in
Commercial Gender Classification
by Joy Buolamwini. There is also a great video describing her research. She found systematic bias in commercial face classification software accuracy based on gender and race.


(Jeremy Howard) #2

That’s a great paper! Thanks for sharing.

I add every paper I’m interested in to my Twitter favorites list so that anyone interested can simply follow that. There are 2700 tweets there now! https://twitter.com/jeremyphoward/likes

Also, if anyone is interested in trying to implement one of these papers in pytorch (especially with fastai integration), please do post on the forum and at-mention me after you’ve started, since I’d love to see what you’re up to - and help if possible.


(James Requa) #3

This looks like a good thread! Here is my first paper add for now. Would be awesome to implement this! :slight_smile:

Group Normalization
(published march 22, 2018 by FAIR - Facebook AI Research - the infamous Kaiming He)
https://arxiv.org/abs/1803.08494

The general idea is that performance of BN can suffer from small batch sizes which GN is designed to do well with (excerpt below):

On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2. GN can outperform or compete with its BN-based counterparts for object detection and segmentation in COCO.


(James Requa) #4

Here is a recent paper on YOLO v3, the way its written is a joke but quite an entertaining read especially if you have read a lot of papers before it will resonate even more :slight_smile:


(Suvash Thapaliya) #5

It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry.

Gonna make some popcorn first. :joy:


(Pavel Surmenok) #6

I just started reading a paper about RetinaNet because it was better than YOLO2. But YOLO3 looks really interesting.
BTW, is anybody working on building YOLO or RetinaNet or any other modern object detection models with PyTorch?
I want to try. Looks like the most challenging part will be defining the loss function.


(Alexandre Cadrin-Chênevert) #7

Thanks James for this link ! Kaiming He is one my heros in computer vision ! R. Girshick and C. Szegedy are the others.

This is a very promising normalization technique for high res images (i.e medical imaging).

Last year, the RSNA bone age challenge was won by a team (http://16bit.ai) that used a Tesla M40 24 GB GPU to train 500x500 images with BN and larger batches than available with a more standard 12 GB GPU. Multi-gpu is not a good workaround for this intrinsic BN problem because it is very hard to update BN layers dynamically between different gpus. I really look forward to try GN.


(Jeremy Howard) #8

We’ll be doing RetinaNet tonight :slight_smile: (except for the feature pyramid, which we’ll do later)


(James Requa) #9

Looks like there is already an implementation of GN in Pytorch!


(Mike Kunz ) #10

As an Exercise, I want to implement this in Pytorch: http://lanl.arxiv.org/abs/1408.5882.

Second step will be to do this one:

Lastly I want to extend the approached to classification based off of multiple text corpora, multiple input features, and e-commerce type applications and features (product descriptions, review texts, prices, review stars, etc).

I am familiar with Jeremy’s paper on this also.

Likely this will mean moving away from a CNN approach to an RNN approach, or an ensemble approach.


(nkiruka chuka-obah) #11

I like “Deep Image Prior” by Ulyanov et. al, https://dmitryulyanov.github.io/deep_image_prior . The paper is well written, with python notebooks that reproduce all their results.

Encoder-Decoder structures fascinate me, especially as they are applied to image segmentation. The way they work and not work is like magic to me. Right now, I’m interested in how to disambiguate the various factors that go into a particular encoding. Still looking for research papers that focus on this.


(Ananda Seelan) #12

This recent ICLR paper “Learning Longer-Term Dependencies in RNN with Auxiliary Losses” from Luong Thang group of Google Brain got me very interested.

They tackle the problem of learning very long sequences, even upto 16000 tokens with traditional RNNs including LSTM. The trick seemed to be a very simple but a super effective one. From the abstract,

We present a simple method to improve learning long-term dependencies in recurrent neural networks (RNNs) by introducing unsupervised auxiliary losses. These auxiliary losses force RNNs to either remember distant past or predict future, enabling truncated backpropagation through time (BPTT) to work on very long equences. We experimented on sequences up to 16 000 tokens long and report faster training, more resource efficiency and better test performance than full BPTT baselines such as Long Short Term Memory LSTM) or Transformer.

The tweet that had the longer version of the paper - https://twitter.com/lmthang/status/969389594448818177


(Nahid Alam) #13

very interesting as I am interested in NLP


(Nahid Alam) #14

Currently reading - “A Neural Conversational Model


(Divyansh Jha) #15

I want to implement One-pixel attack for fooling deep neural networks, Anybody wants to help me out in this?


(Sharwon Pius) #16

Very Interesting!
I think they could have come up with a generalised model for deep neural networks. For example, replacing a single letter or a word in the corpus can lead to drastic changes in the output. Basically, ‘word’ or a ‘pixel’ is a weight relative to the input-output function.

Am I missing some details, do you think one can build a DNN model which can find these adversaries?


(Divyansh Jha) #17

Actually, it’s not what adversarial examples mean.
Adversarial examples mean some very little perturbations to the input to the model so that it’s actual label doesn’t change(means it appears same to humans) but model gets fooled.

Changing the character in a word may change it’s actual meaning as well to the humans. For example - (Fall-Tall) and certainly it can change the meaning of the sentence.

Check out my blog on this here.


(Sharwon Pius) #18

At a high level intuition… Cnn’s try to recognise patterns across ‘space’. Rnn’s try to recognise patterns across ‘time’. Most of the papers that I came across, try to tackle adversaries in ‘space’. It makes sense to say that there can be adversaries that can occur in ‘time’. That is what i meant by stating to change a letter. A better intuition would be to change the sentence structure or arrangement. For example, a sarcastic sentence. Would that classify as an adversary?

or try this…

I notice a discrepancy in thoughts, when I come across this topic.


(Nikhil B ) #19

After yesterday’s lesson 9 lecture I came across this one :

Speed/accuracy trade-offs for modern convolutional object detectors

This paper does a controlled experimental analysis of some of the recent object detectors and compares the performance of Faster R CNN’s, R FCN and SSD architectures.
Might help someone choose an appropriate object detection method depending on speed, accuracy or memory footprint requirements.


(Jeremy Howard) #20

It’s great - although parts are a bit out of date now.