I am trying to implement the “Show, Attend and Tell” paper. I thinking about the possibility of using the AWD_LSTM pre-trained language model for caption generation.
Firstly, I want to know whether my approach for using pre-trained model is correct, here is the code:
## language data bunch data_lm = (TextList.from_df(df=metadata,path='.',cols='labels') .split_by_rand_pct(0.1) .label_for_lm() .databunch(bs=100)) # create learner object learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3,pretrained=True) # fine tune model learn.fit_one_cycle(8, 1e-2, moms=(0.8,0.7)) # save model parameters learn.save('fine_tuned_LM') learn.save_encoder('fine_tuned_LM_Encoder') ## load model parameters pretrained_lstm = AWD_LSTM(vocab_size, emb_sz=812, n_hid = 512,n_layers= 1) wgts = torch.load('models/fine_tuned_LM.pth') params = list(zip(wgts.items(),pretrained_lstm.state_dict().items())) for p in params: name = p pretrained_lstm.state_dict()[name] = p
I have a few questions on this:
I have to pass a modified hidden state, the output of the attention model. How do I do that?
Since I using a pre-trained language model, it supposes to store vocabulary (word to index mapping). How this vocabulary information is stored.
It is enough to load just ‘fine_tuned_LM_Encoder’ or I should load entire ‘fine_tuned_LM’. In IDMB sentimental analysis tutorial, it was used for classification but for my case goal is text generation. I believe the entire model should be loaded.
Since I used AWD_LSTM loaded with wiki-103 weights, I had to same architecture but when I initiated with different hyperparameters (emb_sz=812, n_hid = 512) and loaded my fine-tuned weights it did not throw an error as such. I am confused about what is going on inside it.