Multi-one-hot vs. dense embedding


(Sparkle Russell-Puleri) #1

Hi All,
I’m reproducing some work from an older paper(2016) in Theano . The author spend the time to create a dense representation of the vocabulary, then during the padding step multi one hot encode the sequences which where then fed to an embedding layer. Is this necessary if the vocabulary was already encoded? Couldn’t we just multiply the encoded sequences by our embedding matrix?

E.g.
Step1: Encoded_seq =[[1,2,9,3],[3,4,5]]
Step 2: ‘Muti-1hot (with padding) = [[0,1,1,1,0,0,0,0,01], [0,0,0,1,1,1,0,0,0,0]]
Step 3: embedding layer then inputs into a GRU layer.

The second step was padded with zeros and it’s len represents the length of the vocabulary (9 being the max here). Is there any reason the sequences in Step 1 could not be directly used for input into the embedding layer?

Thanks in advance!