In addition, it is valuable in its own right that embeddings are continuous, because models are better at understanding continuous variables.

If I’m not missing any key point here, this phrase is kinda misleading because I think they don’t mean the embeddings are continuous but instead the distance between embeddings is continuous. Am I right?

So the embedding layer is taking the categorical values (0 and 1) and converting them into 10 continuous number that can be trained.

So before the only information that the model had as an input was a single value that was either a 0 or a 1 and by passing those through an embedding layer, it converted those 2 classes into 10 continuous values that the model could use to represent different information about those classes.

I thought embedding was just indexing into array so that we can do the matrix multiplication. I didn’t know it converts discrete values to continuous values.