I’m trying to do a translator from Swedish to English which uses awd-lstm as encoder and decoder, but I’m having some issues. Some are regarding the fastai implementation of awd-lstm, some are in more seq2seq related. Anyway here’s my current attempt:
p = dict(enc_vocab_len=len(vocab_sv.itos), dec_vocab_len=len(vocab_en.itos), emb_sz=300,am_hidn=1152,max_len=33,am_layers=3) #params class Seq2Seq_AWD_LSTM_v0(nn.Module): def __init__(self, p): super().__init__() #awd-lstm has built in embedding, so I train that instead of using a premade one. #Unsure if works, I think so though self.encoder = AWD_LSTM(p["enc_vocab_len"], p["emb_sz"], p["am_hidn"], p["am_layers"]) self.decoder = AWD_LSTM(p["dec_vocab_len"], p["emb_sz"], p["am_hidn"], p["am_layers"]) self.out = nn.Linear(p["emb_sz"], p["dec_vocab_len"]) self.pad_idx = 1 self.max_len = p["max_len"] def forward(self, inp): self.encoder.reset() #reset states self.decoder.reset() #reset states #returns (raw_outputs, outputs). raw_outputs=without dropout, outputs=with dropout(except last layer) enc_states_nodp, enc_states_dp = self.encoder(inp) #last_hidden is of shape <list>[am_layer]<torch-tensor>[1,16, emb_sz] #what the 2, 1, 16 represents I don't know. The bs = 33 last_hidden = self.encoder.hidden dec_inp = something #[cell_state, hidden_state] ? output_sentence =  for i in range(self.max_len): # How to feed it into decoder awd-lstm? dec_states_nodp, dec_states_dp = self.decoder(dec_inp) #predict word on last cell state (?) from decoder pred_word = self.out(dec_states_dp[-1]) dec_inp = again_something #[cell_state, hidden_state] ? #not sure how to get the softmaxed prediction on each batch here, max is guess #append predicted word(index) to translated sentence output_sentence.append(pred_word.max()) #if all batch-iterations produce padding, break if (dec_inp==self.pad_idx).all(): break return torch.stack(output_sentence, dim=1) #return sentence(s)
Fast-ai related questions:
- What do raw_outputs and outputs mean? My current understanding is that raw_outputs is the cell states for every layer without dropout, and outputs is the cell states with dropout (except last layer). Is this correct?
- How do i get the hidden states? Also, can someone explain encoder.hidden?
- How can i feed cell_state and hid_state into the decoder awd-lstm (default input is word representations, I think, not cell states)?
- Can I use the awd_lstm embedding layers instead of an external one?
General seq2seq questions:
- Do i predict a translated word using the lstm hidden state or the cell/context state? Meaning: what tensor do I actually pass into the linear layer to produce a prediction?
Any answer to any of my questions would be much appreciated, thanks!