Research Paper Recommendations

SHAR1 · March 27, 2018, 8:08am

At a high level intuition… Cnn’s try to recognise patterns across ‘space’. Rnn’s try to recognise patterns across ‘time’. Most of the papers that I came across, try to tackle adversaries in ‘space’. It makes sense to say that there can be adversaries that can occur in ‘time’. That is what i meant by stating to change a letter. A better intuition would be to change the sentence structure or arrangement. For example, a sarcastic sentence. Would that classify as an adversary?

or try this…

I notice a discrepancy in thoughts, when I come across this topic.

beecoder · March 27, 2018, 9:20pm

After yesterday’s lesson 9 lecture I came across this one :

Speed/accuracy trade-offs for modern convolutional object detectors

This paper does a controlled experimental analysis of some of the recent object detectors and compares the performance of Faster R CNN’s, R FCN and SSD architectures.
Might help someone choose an appropriate object detection method depending on speed, accuracy or memory footprint requirements.

jeremy · March 27, 2018, 9:59pm

It’s great - although parts are a bit out of date now.

beecoder · March 27, 2018, 10:15pm

Oh right, this paper is a year old…guess I forgot we are in a cutting edge DL class
Some comparison on these lines with current architectures might be a good idea.

emilmelnikov · March 29, 2018, 7:29am

A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay

A fresh (26 March 2018) paper from Leslie Smith (author of cyclical learning rates).
Even though I’ve just 5-minute-skimmed it, it seems to be a very practical and quite easy to read piece.

miguel_perez · March 29, 2018, 10:41am

Fantastic. This attempt to see how pieces of the training puzzle interact is really insightful, and scarce.

His proxy for superconvergence, learning rate policy “1cycle”, is easy to replicate with fast.ai cycle_len and cycle_mult, (it implies always using only two cycles, though) quoting:

Here we suggest a slight modification of cyclical learning rate policy for super-convergence; always
use one cycle that is smaller than the total number of iterations/epochs and allow the learning rate to
decrease several orders of magnitude less than the initial learning rate for the remaining iterations.
We named this learning rate policy “1cycle” and in our experiments this policy allows the accuracy
to plateau before the training ends.

Also regularizing effect of lr size and bach size, sample + parameter dependency of regularization…
it is all there!

Even if you implement a Bayesian optimizer, that I have used in the past, you still need to
understand as much as possible this interactions to tune that optimizer, imo this is priceless.

Thank you for sharing!

jeremy · March 29, 2018, 2:48pm

One cycle, not two cycles

miguel_perez · March 29, 2018, 4:16pm

Oh, reason why I thought 2 cycles was mention of choosing “one cycle smaller than total number of epochs”, and then referal to learning rate for “remaining iteratios”. If there are “remaining iterations” we must need another cycle to complete that "total number of epochs?

So my assumption was cycle1 , cycle2 + cycle_mult so that cycle2 = remaining interations…

But being the name of the trick “rate policy 1 cycle” it would make more sense to be 1 cycle… Though then don’t get how that can fit with a “smaller than total number of epochs” cycle length.

I guess I better re-read that paragraph some more times…

jeremy · March 29, 2018, 7:08pm

I found the paragraph confusing too. I asked Leslie to include an image, but didn’t have time. So I’m still not 100% sure what he means. But I think the single cycle use_clr we have is at least pretty close to his intention - but perhaps not identical.

asparagui · March 30, 2018, 4:57pm

Regularized Evolution for Image Classifier Architecture Search:

https://arxiv.org/abs/1802.01548

mmr · April 1, 2018, 3:50am

My initial contribution to part 2 , ahem …
I am currently hooked into reinforcement learning , sorry for that.This is the classic Alpha Go paper, it uses convolutional neural network to train known Go games. And then uses that knowledge as prior to train two different network, policy and value network. The job of these two networks is then to self play and improve and ultimately beat human players.
David Silver primary author of this paper and the guy who does lot of important things at Deepmind , gave a nice tutorial of this paper here.
The next iteration of their work introduced AlphaGo Zero - which didn’t require human knowledge as prior to train.It will be another classic paper if you are into reinforcement learning.

emilmelnikov · April 3, 2018, 5:42am

Going through some random Reddit thread, I’ve stumbled upon the marvellous paper titled
Stopping GAN Violence: Generative Unadversarial Networks. When the first author’s institution is called “Institute of Deep Statistical Harmony”, you know it’ll be a good read

piotr.czapla · April 3, 2018, 8:33am

@mmr have you seen this article: https://www.alexirpan.com/2018/02/14/rl-hard.html , it might help break a bit with RL, besides it is a good read.

mmr · April 3, 2018, 8:56am

Well, I am fond of the research solutions that deepmind and openai often comes up with. And about the efficiency of RL algorithms in solving problems, I know its hard, that’s why I am interested in it. There are still room for improvement in RL.Also personally its my early days with RL, so let’s see.

Kasianenko · April 3, 2018, 10:10am

Thanks for great thread!
Here is one particularly good and new article:

Even though it’s been already discussed in details in Implementing Mask R-CNN I think it’s worth mentioning

snagpaul · April 3, 2018, 5:34pm

This has to be the funniest thing I have ever read. Thanks!

asparagui · April 5, 2018, 2:36am

Adversarial Patch
Adversarial Logit Pairing

Optimizing Neural Networks with Kronecker-factored Approximate Curvature
K-FAC: Kronecker-Factored Approximate Curvature

asparagui · April 5, 2018, 2:41am

@mmr:

I did a talk on the Alpha Go series of engines the other day, you might find it interesting:

Starts ~20 minutes in, the audio is rough for the first 5-10 minutes, sorry.

Slides:

http://static.brettkoonce.com/presentations/go.pdf

mmr · April 5, 2018, 3:49am

A illustrative fully coded nlp attention example from Harvard NLP group.
http://nlp.seas.harvard.edu/2018/04/03/attention.html

mmr · April 5, 2018, 3:50am

Thanks.