Hello!
I’m trying to do a translator from Swedish to English which uses awd-lstm as encoder and decoder, but I’m having some issues. Some are regarding the fastai implementation of awd-lstm, some are in more seq2seq related. Anyway here’s my current attempt:
p = dict(enc_vocab_len=len(vocab_sv.itos), dec_vocab_len=len(vocab_en.itos),
emb_sz=300,am_hidn=1152,max_len=33,am_layers=3) #params
class Seq2Seq_AWD_LSTM_v0(nn.Module):
def __init__(self, p):
super().__init__()
#awd-lstm has built in embedding, so I train that instead of using a premade one.
#Unsure if works, I think so though
self.encoder = AWD_LSTM(p["enc_vocab_len"], p["emb_sz"], p["am_hidn"], p["am_layers"])
self.decoder = AWD_LSTM(p["dec_vocab_len"], p["emb_sz"], p["am_hidn"], p["am_layers"])
self.out = nn.Linear(p["emb_sz"], p["dec_vocab_len"])
self.pad_idx = 1
self.max_len = p["max_len"]
def forward(self, inp):
self.encoder.reset() #reset states
self.decoder.reset() #reset states
#returns (raw_outputs, outputs). raw_outputs=without dropout, outputs=with dropout(except last layer)
enc_states_nodp, enc_states_dp = self.encoder(inp)
#last_hidden is of shape <list>[am_layer][2]<torch-tensor>[1,16, emb_sz]
#what the 2, 1, 16 represents I don't know. The bs = 33
last_hidden = self.encoder.hidden
dec_inp = something #[cell_state, hidden_state] ?
output_sentence = []
for i in range(self.max_len):
# How to feed it into decoder awd-lstm?
dec_states_nodp, dec_states_dp = self.decoder(dec_inp)
#predict word on last cell state (?) from decoder
pred_word = self.out(dec_states_dp[-1])
dec_inp = again_something #[cell_state, hidden_state] ?
#not sure how to get the softmaxed prediction on each batch here, max is guess
#append predicted word(index) to translated sentence
output_sentence.append(pred_word.max())
#if all batch-iterations produce padding, break
if (dec_inp==self.pad_idx).all(): break
return torch.stack(output_sentence, dim=1) #return sentence(s)
Fast-ai related questions:
- What do raw_outputs and outputs mean? My current understanding is that raw_outputs is the cell states for every layer without dropout, and outputs is the cell states with dropout (except last layer). Is this correct?
- How do i get the hidden states? Also, can someone explain encoder.hidden?
- How can i feed cell_state and hid_state into the decoder awd-lstm (default input is word representations, I think, not cell states)?
- Can I use the awd_lstm embedding layers instead of an external one?
General seq2seq questions:
- Do i predict a translated word using the lstm hidden state or the cell/context state? Meaning: what tensor do I actually pass into the linear layer to produce a prediction?
Any answer to any of my questions would be much appreciated, thanks!