Part 2 lesson 11 wiki

aza · April 10, 2018, 3:36am

Could we try a meta-residual style architecture to improve this model?

That is: train a secondary seq2seq model that takes the output of this first seq2seq translator and then use that as a new input to predict the original sentence.

For example: the first seq2seq model goes

From: qu’ est -ce qui ne peut pas changer ? _eos_
To : what can not change change ? _eos_

And then train the second seq2seq model to go

From: what can not change change ? _eos_
To: what can not change ? _eos_

tensoralex · April 10, 2018, 3:36am

So concept of Attention has some similarities Capsule nets concept?

surmenok · April 10, 2018, 3:37am

The previous one was end-to-end as well.

emilmelnikov · April 10, 2018, 3:37am

From what I’ve seen, the term “end-to-end” is used when someone discovers an “ordinary” neural network way to do something that previously required hand-crafted features or something else.

Deb · April 10, 2018, 3:38am

ok then it makes sense.

ananda_seelan · April 10, 2018, 3:39am

Yes. Because of the nifty auto differentiation done by pytorch.

snagpaul · April 10, 2018, 3:39am

Yeah! Let’s write a compiler.

Ducky · April 10, 2018, 3:39am

I want to automatically generate tests!

Even · April 10, 2018, 3:41am

Is there anything preventing attention from being used in language models? Could we have added the same weighted multiple of previous hidden outputs?

fmichaelkunz · April 10, 2018, 3:42am

Other use cases for sequence to sequence Models : Multi-Class Categorization.

Has anyone seen papers or approaches for the above in a supervised learning context?

Ducky · April 10, 2018, 3:43am

How long does the Devise model take to train?

ananda_seelan · April 10, 2018, 3:43am

You really could do that. There is a family of networks called Transformer nets, wherein nothing but attention(not even RNNs) is used. Language models, seq2seq trained with this architecture seem to be very much competetive when compared with traditional RNN based language models.

ananda_seelan · April 10, 2018, 3:45am

Could multi-class categorization be really modelled as seq2seq? I reckon seq2seq could be used when there is a sequence(one after the other) of tokens as output, whereas multi-class output doesn’t really form a sequence(as in no dependency between classes). Please correct me if I’m wrong.

fmichaelkunz · April 10, 2018, 3:48am

Depends. Could be highly correlated product categories. For example, the games category in amazon could be ‘games - board’, ‘games - computer’, ‘games - strategy’ etc etc and we might want to distill classes of goods into ‘games-computer-strategy’.

Could also model this just like a softmax but replace with sigmoid function.

gerardo · April 10, 2018, 3:51am

These are xor
is_multi=False, is_reg=True

Correct?

yinterian · April 10, 2018, 3:52am

is_reg is probably enough. Look at the code.

fmichaelkunz · April 10, 2018, 3:54am

Paper on Non - Euclidean (spherical) measures of similarity for word vectors. I’m like, “whoa” https://arxiv.org/pdf/1705.08039.pdf

fmichaelkunz · April 10, 2018, 3:57am

Why KNN and not K-Means?

surmenok · April 10, 2018, 3:59am

K-means is a clustering algorithm.

gerardo · April 10, 2018, 3:59am

The last portion was really good