Embedding Layer from the scratch


I am a newbie to this field and this fast ai course helped me out to get the overall understanding and bootstrap into this field faster.

I was in the process of Deep learning library to cover all the basics and it is mainly to understand the internals and help my team for practical coding. Please find the GitHub link for the same.

Here my next exercise is to create a RNN from scratch for language model. For that i want to create a Embedding layer from scratch for language model. Here the forward propagation looks straight forward and the backward propagation is the one i got stuck in… I not able to find out the approach regarding the backward propagation else where… Please find my code below(i have copied and commented the back propagation from Dense class which i written refer git-hub).

  1. How do i calculate dw ?
  2. also is that i need to iterate the input matrix and update weights by slicing indexes based on input vector. In this case the weight vector might get overridden right?.

So what is the right way to do the back propagation for embedding layer for learning the embedding weights


import numpy as np

from DeepLearnerBase import Layer, SGD, CrossEntropyForSoftMax, Sequential, Activation, softmax, Dense
import copy

class Embedding(Layer):

def __init__(self, vocabsize, emb_size):
    self.emb_size = emb_size 
    self.vocabsize= vocabsize 
def setup(self,optimizer=None,loss=None):
    #self.loss = loss
    self.outputsize  = self.inputshape * self.emb_size
    self.w = np.random.uniform(-1,1,(self.vocabsize,self.emb_size) )
    self.w_opt = copy.copy(optimizer)            
def shape(self):
    return (self.inputshape ,self.outputshape()) 

def outputshape(self):
    return self.outputsize       

def forward(self, X, training = True): 
    self.input = X
    self.emb_data = np.asarray([self.w[j].ravel()  for j in X])       
    return self.emb_data

def backward(self, grad):
    W = self.w    
    #self.dw = np.dot(self.input.T,grad)
    #self.db = np.sum(grad, axis =0, keepdims=True) 
    #grad = grad.dot(W.T)        
    #self.w =self.w_opt.update(self.w, self.dw)     
    #if hasattr(self.loss, 'reg'):
        #self.dw += self.loss.reg *  self.w   
    return grad