Part 2 lesson 11 wiki


(Aza Raskin) #114

Could we try a meta-residual style architecture to improve this model?

That is: train a secondary seq2seq model that takes the output of this first seq2seq translator and then use that as a new input to predict the original sentence.

For example: the first seq2seq model goes

  • From: qu’ est -ce qui ne peut pas changer ? _eos_
  • To : what can not change change ? _eos_

And then train the second seq2seq model to go

  • From: what can not change change ? _eos_
  • To: what can not change ? _eos_

(Alex) #115

So concept of Attention has some similarities Capsule nets concept?


(Pavel Surmenok) #116

The previous one was end-to-end as well.


(Emil) #117

From what I’ve seen, the term “end-to-end” is used when someone discovers an “ordinary” neural network way to do something that previously required hand-crafted features or something else.


(Debashish Panigrahi) #118

ok then it makes sense.


(Ananda Seelan) #119

Yes. Because of the nifty auto differentiation done by pytorch.


(Sneha Nagpaul) #120

Yeah! Let’s write a compiler.


(Kaitlin Duck Sherwood) #121

I want to automatically generate tests!


(Even Oldridge) #122

Is there anything preventing attention from being used in language models? Could we have added the same weighted multiple of previous hidden outputs?


(Mike Kunz ) #123

Other use cases for sequence to sequence Models : Multi-Class Categorization.

Has anyone seen papers or approaches for the above in a supervised learning context?


(Kaitlin Duck Sherwood) #124

How long does the Devise model take to train?


(Ananda Seelan) #125

You really could do that. There is a family of networks called Transformer nets, wherein nothing but attention(not even RNNs) is used. Language models, seq2seq trained with this architecture seem to be very much competetive when compared with traditional RNN based language models.


(Ananda Seelan) #126

Could multi-class categorization be really modelled as seq2seq? I reckon seq2seq could be used when there is a sequence(one after the other) of tokens as output, whereas multi-class output doesn’t really form a sequence(as in no dependency between classes). Please correct me if I’m wrong.


(Mike Kunz ) #127

Depends. Could be highly correlated product categories. For example, the games category in amazon could be ‘games - board’, ‘games - computer’, ‘games - strategy’ etc etc and we might want to distill classes of goods into ‘games-computer-strategy’.

Could also model this just like a softmax but replace with sigmoid function.


(Gerardo Garcia) #128

These are xor
is_multi=False, is_reg=True

Correct?


(yinterian) #130

is_reg is probably enough. Look at the code.


(Mike Kunz ) #131

Paper on Non - Euclidean (spherical) measures of similarity for word vectors. I’m like, “whoa” https://arxiv.org/pdf/1705.08039.pdf


(Mike Kunz ) #132

Why KNN and not K-Means?


(Pavel Surmenok) #133

K-means is a clustering algorithm.


(Gerardo Garcia) #134

The last portion was really good