Research Paper Recommendations

(brett koonce) #42

Colorless green recurrent networks dream hierarchically

(RobG) #43

Deep painterly harmonization

(Kaitlin Duck Sherwood) #44

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
Blog post:

Basically, they look at creating good data visualizations as a seq2seq problem: a sequence of data comes in, a sequence of code comes out as Vega-Lite declarative visualization code (which then gets turned into a nice graph). Quite nice.

(Kaitlin Duck Sherwood) #45

This paper, Universal Sentence Encoder, looks very applicable to Lesson 11:

(Jeremy Howard) #46

Yes it is. I know @narvind2003 has talked about benchmarking the TF Hub implementation against our language models. I’d be interested in the results of that.

(Mike Kunz ) #47

Im still flabbergasted.

(Arvind Nagaraj) #48


(Igor Kasianenko) #49

I’d like to talk about DeepLab semantic image segmentation:
As far as I found it’s second solution along with Mask R-CNN to do segmentation and beat all benchmarks with one core difference - Mask does instance image segmentation.
If anyone has suggestions about instance image segmentation - please @mention me or write a pm.

There is one more network that does instance image segmentation, called (Fully Convolutional Instance-aware Semantic Segmentation)[https//].

(Kaitlin Duck Sherwood) #50

Last night at our local Data Science Reading Group, we read a paper on topological data analysis which was interesting in a “huh, maybe that would be useful someday” kind of way. This is the paper which we read:
but it is VERY math-y. Here is one which I have only just barely skimmed, just enough to see that it covers the same material and is much more readable:
(It’s also more recent.)

Wikipedia article on topological data analysis might give you all of what you need for seeing the basic concept:

Basically, you use the shape of the data’s point cloud to give clues to dimensionality reduction. TDA also have gives interesting clustering algorithms.

(Jeremy Howard) #51

FYI there’s a company called Ayasdi that is commercializing those algorithms.

(Kaitlin Duck Sherwood) #52

Do you know anything about them? Their name came up in the discussion last night, but nobody had ever heard of them, so we didn’t know how successful they are being.

(RobG) #53

A Visual Debugging Tool for Sequence-to-Sequence Models

(Marco) #54

DeepMind just published their ICLR papers list:

One which looks very interesting is MbPA Memory-based Parameter Adaptation which is the generalised version of simple cache pointers, which @sgugger has just tweeted about:

Sounds like a project to me!

(Kaitlin Duck Sherwood) #55

I’m attending the ICLR conference, and between recent Fast.AI material and ICLR, I’m seeing some definite themes:

  1. Start small, get bigger.
  1. Have the model look at a bunch of unlabelled data, then learn from a very small amount of labelled data.

(Kaitlin Duck Sherwood) #56

This poster about adversarial images was interesting. Basically, they proved that there will always be an adversarial image somewhere which will fool the model.

(This doesn’t prove that it is easy to find a successful adversarial image, just that they exist.)

(Kaitlin Duck Sherwood) #57

Oh, and this paper said it had a really cheap/easy way to improve the training of GANs:

So easy that I think it would be a good candidate to incorporate into the FastAI libraries if it works as advertised.

(Kaitlin Duck Sherwood) #58

I was fortunate to get to spend a fair bit of time with Leslie Smith at the ICLR conference, and wish to report that he thinks that one should cyclically vary the dropout rate in step with the learning rate variation.

Apologies, I don’t remember if he said that he had done experiments or if that was just his intuition.

It made sense to me that you’d want to ramp up the dropout rate, but it wasn’t immediately obvious to me why you would want to ramp down. He made the point that when you are doing inference, you don’t have any dropout. You’d like to end with no dropout so that your end model kind of matches the situation you’re in when you do inference… so you want to ramp down your dropout rate.

(Jeremy Howard) #59

Yes! I saw a paper that tried this - can’t remember where… Also for DAWNBench I reduced data augmentation at the end. Seems like we should gradually move everything towards inference-time settings - maybe gradually increase batchnorm momentum too…

(Kaitlin Duck Sherwood) #60

I found this paper on “Curriculum dropout”:

Skimming it (SKIMMING!), it looks like they increase dropout but never decrease it.

(Arvind Nagaraj) #61

Has anyone tried to reduce BPTT + increase bs , gradually, for LM backed classifiers? It seemed to work quite well for me, but I only tried it on Quora dataset so far.
Initially, I dismissed this thinking that there are fewer pad tokens with smaller bptt for a dataset like Quora where the length of an item is approx 20(mean+std Dev).
So I tried training with a fixed but small bptt around 20 and tried:

  1. Steadily increasing bs
  2. Fixed large bs
    Neither worked as well as the case where I gradually reduced bptt and increased bs.
    Matrix has to go from (short-stout) to (tall-lean). The LM was trained at bptt 20 and the classifier started from bptt 50 and came down to 20 towards the end of training.