Part 2 lesson 11 wiki

Could we try a meta-residual style architecture to improve this model?

That is: train a secondary seq2seq model that takes the output of this first seq2seq translator and then use that as a new input to predict the original sentence.

For example: the first seq2seq model goes

  • From: qu’ est -ce qui ne peut pas changer ? _eos_
  • To : what can not change change ? _eos_

And then train the second seq2seq model to go

  • From: what can not change change ? _eos_
  • To: what can not change ? _eos_
1 Like

So concept of Attention has some similarities Capsule nets concept?

1 Like

The previous one was end-to-end as well.

1 Like

From what I’ve seen, the term “end-to-end” is used when someone discovers an “ordinary” neural network way to do something that previously required hand-crafted features or something else.

3 Likes

ok then it makes sense.

Yes. Because of the nifty auto differentiation done by pytorch.

Yeah! Let’s write a compiler.

I want to automatically generate tests!

1 Like

Is there anything preventing attention from being used in language models? Could we have added the same weighted multiple of previous hidden outputs?

4 Likes

Other use cases for sequence to sequence Models : Multi-Class Categorization.

Has anyone seen papers or approaches for the above in a supervised learning context?

How long does the Devise model take to train?

You really could do that. There is a family of networks called Transformer nets, wherein nothing but attention(not even RNNs) is used. Language models, seq2seq trained with this architecture seem to be very much competetive when compared with traditional RNN based language models.

1 Like

Could multi-class categorization be really modelled as seq2seq? I reckon seq2seq could be used when there is a sequence(one after the other) of tokens as output, whereas multi-class output doesn’t really form a sequence(as in no dependency between classes). Please correct me if I’m wrong.

Depends. Could be highly correlated product categories. For example, the games category in amazon could be ‘games - board’, ‘games - computer’, ‘games - strategy’ etc etc and we might want to distill classes of goods into ‘games-computer-strategy’.

Could also model this just like a softmax but replace with sigmoid function.

These are xor
is_multi=False, is_reg=True

Correct?

is_reg is probably enough. Look at the code.

2 Likes

Paper on Non - Euclidean (spherical) measures of similarity for word vectors. I’m like, “whoa” https://arxiv.org/pdf/1705.08039.pdf

4 Likes

Why KNN and not K-Means?

K-means is a clustering algorithm.

The last portion was really good

7 Likes