Listening to Lecture 5, here, a student brings up a good question about if order matters in our sentiment analysis of the IMDB dataset, right after we’ve built the most basic NN to start of with.

It turns out that even though our input may be sorted by frequency (?), we are actual not using any kind of Bag of Words model or idea or technique.

Jeremy says (and I’m paraphrasing because he’s talking faster than I can type! ),

“We are connecting every one of the inputs to the output, but doing it for every one of the incoming factors, creating a big cartesian product of all the weights, which takes into account position.”

This is quite a mouthful. Jeremy could you please elaborate?

When we looked at the collaborative filtering example which helped to motivate the embedding (including a dot product), there wasn’t any talk of position or cartesian products. Is there more going on, and how do we maintain the order in this way?

If a dense layer as n inputs, and m outputs, then it has a weight matrix of size n*m. In other words, every input is connected to every output. Hence the ‘cartesian product’ comment. Does that make sense? It’s got nothing to do with embeddings or sentiment analysis - it’s true of all dense layers.

Since every input is connected to evey output, if order turns out to matter (which presumably it does!) then it has the degrees of freedom to handle this.

What does the word ‘connected’ mean in this context?

Did you just mean to say ‘cartesian product’ to mean, basically, a dot product? I’m thinking that I might be looking too hard into it and falling into a rabbit hole.