Lesson 4: Trying to understand the pre processing


I am trying to understand the pre-processing done for setting up the data in Embeddings in Lesson 4.
We are doing the following

  • Read ratings
  • Find unique userIds and movieIds
  • Change these ids to be continuous saying that “We update the movie and user ids so that they are contiguous integers, which we want when using embeddings.” - Why do we want this?

Eventually those Embeddings get translated into a tensor (think multi dimensional matrices). Large tensors take up a lot of memory on the GPU. You don’t want a large, sparse tensor where most elements are zero as that would be a waste.

