Research Paper Recommendations

Colorless green recurrent networks dream hierarchically
https://arxiv.org/abs/1803.11138

Deep painterly harmonization


https://arxiv.org/abs/1804.03189

2 Likes

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
Paper: https://arxiv.org/abs/1804.03126
Blog post: https://towardsdatascience.com/data2vis-automatic-generation-of-data-visualizations-using-sequence-to-sequence-recurrent-neural-5da8e9d3e43e

Basically, they look at creating good data visualizations as a seq2seq problem: a sequence of data comes in, a sequence of code comes out as Vega-Lite declarative visualization code (which then gets turned into a nice graph). Quite nice.

2 Likes

This paper, Universal Sentence Encoder, looks very applicable to Lesson 11:
https://arxiv.org/abs/1803.11175v2

Yes it is. I know @narvind2003 has talked about benchmarking the TF Hub implementation against our language models. I’d be interested in the results of that.

1 Like

Im still flabbergasted.

LOL…:grinning:

I’d like to talk about DeepLab semantic image segmentation:
https://arxiv.org/abs/1606.00915
As far as I found it’s second solution along with Mask R-CNN to do segmentation and beat all benchmarks with one core difference - Mask does instance image segmentation.
If anyone has suggestions about instance image segmentation - please @mention me or write a pm.

Update:
There is one more network that does instance image segmentation, called (Fully Convolutional Instance-aware Semantic Segmentation)[https//arxiv.org/abs/1611.07709].

1 Like

Last night at our local Data Science Reading Group, we read a paper on topological data analysis which was interesting in a “huh, maybe that would be useful someday” kind of way. This is the paper which we read:
https://arxiv.org/abs/1609.08227
but it is VERY math-y. Here is one which I have only just barely skimmed, just enough to see that it covers the same material and is much more readable:
https://arxiv.org/pdf/1710.04019.pdf
(It’s also more recent.)

Wikipedia article on topological data analysis might give you all of what you need for seeing the basic concept:
https://en.wikipedia.org/wiki/Topological_data_analysis

Basically, you use the shape of the data’s point cloud to give clues to dimensionality reduction. TDA also have gives interesting clustering algorithms.

1 Like

FYI there’s a company called Ayasdi that is commercializing those algorithms.

Do you know anything about them? Their name came up in the discussion last night, but nobody had ever heard of them, so we didn’t know how successful they are being.

A Visual Debugging Tool for Sequence-to-Sequence Models

http://seq2seq-vis.io/

https://arxiv.org/abs/1804.09299

1 Like

DeepMind just published their ICLR papers list: https://deepmind.com/blog/deepmind-papers-iclr-2018/

One which looks very interesting is MbPA Memory-based Parameter Adaptation which is the generalised version of simple cache pointers, which @sgugger has just tweeted about: https://twitter.com/GuggerSylvain/status/990019822024122368

Sounds like a project to me!

2 Likes

I’m attending the ICLR conference, and between recent Fast.AI material and ICLR, I’m seeing some definite themes:

  1. Start small, get bigger.
  1. Have the model look at a bunch of unlabelled data, then learn from a very small amount of labelled data.
6 Likes

This poster about adversarial images was interesting. Basically, they proved that there will always be an adversarial image somewhere which will fool the model.
https://openreview.net/forum?id=SkthlLkPf

(This doesn’t prove that it is easy to find a successful adversarial image, just that they exist.)

Oh, and this paper said it had a really cheap/easy way to improve the training of GANs:
https://arxiv.org/abs/1802.05957v1

So easy that I think it would be a good candidate to incorporate into the FastAI libraries if it works as advertised.

I was fortunate to get to spend a fair bit of time with Leslie Smith at the ICLR conference, and wish to report that he thinks that one should cyclically vary the dropout rate in step with the learning rate variation.

Apologies, I don’t remember if he said that he had done experiments or if that was just his intuition.

It made sense to me that you’d want to ramp up the dropout rate, but it wasn’t immediately obvious to me why you would want to ramp down. He made the point that when you are doing inference, you don’t have any dropout. You’d like to end with no dropout so that your end model kind of matches the situation you’re in when you do inference… so you want to ramp down your dropout rate.

6 Likes

Yes! I saw a paper that tried this - can’t remember where… Also for DAWNBench I reduced data augmentation at the end. Seems like we should gradually move everything towards inference-time settings - maybe gradually increase batchnorm momentum too…

1 Like

I found this paper on “Curriculum dropout”: https://arxiv.org/abs/1703.06229

Skimming it (SKIMMING!), it looks like they increase dropout but never decrease it.

Has anyone tried to reduce BPTT + increase bs , gradually, for LM backed classifiers? It seemed to work quite well for me, but I only tried it on Quora dataset so far.
Initially, I dismissed this thinking that there are fewer pad tokens with smaller bptt for a dataset like Quora where the length of an item is approx 20(mean+std Dev).
So I tried training with a fixed but small bptt around 20 and tried:

  1. Steadily increasing bs
  2. Fixed large bs
    Neither worked as well as the case where I gradually reduced bptt and increased bs.
    Matrix has to go from (short-stout) to (tall-lean). The LM was trained at bptt 20 and the classifier started from bptt 50 and came down to 20 towards the end of training.