Understanding word embeddings

In the video, it is mentioned that the word embeddings are similar to the weights of images that we use, instead of running from scratch.
This makes sense for images. But if we are using these text embeddings for our problem, where we are using deep learning(that is these vector representations did not come out of a pretrained network).
How does it makes sense to use these embeddings (weights (?) I’m not sure if we can call that) in a deep neural network?