I think the inputs don’t change, but their embeddings do.
as an example, say you have a categorical variable Species:[Dog, Cat, Wolf, Pig]
, and two continuous variables weight
and height
.
You have a model that takes in 3 inputs: Species, weight, height
First we randomly initialize their embeddings
Dog: [0.032, 0.02. -0.053]
Cat: [0.07, 0.02. -0.093]
Wolf: [0.029, 0.02. -0.053]
Pig: [0.01, 0.02. -0.037]
When you have a pig, your inputs to the model will be
[0.01, 0.02. -0.037], 100 kg, 60cm
When you have a cat, your inputs to the model will be
[0.07, 0.02. -0.093], 3kg, 10cm
In the beginning, the embeddings will be bad, as with training, they will start to represent certain characteristics of these animals.
After some training your model, their embeddings become
Dog: [0.9, 0.72, 0.83]
Cat: [0.07, 0.85. 0.93]
Wolf: [0.9, -0.5. 0.77]
Pig: [0.1, 0.66. -0.4]
If I were to guess, I think the embeddings represent [genetic proximity, aggression, noise-making]
so that same pig will now be
[0.1, 0.66. -0.4], 100 kg, 60cm
and cat will now be
[0.07, 0.85. 0.93], 3kg, 10cm
The cat remains a cat, and the pig remains a pig. But their embedddings have changed to better reflect certain characteristics