Hi everyone, i’ve been trying to implement Ulmfit on a Dataset of my own(In french). I based my work on the great imdb_scripts which i modified to accommodate my needs. It is working quite well and is my State of the art on this dataset. I managed to understand most of the codes, but they are still some questions in my head.
When you call the get_rnn_classifier, you can chose the structure of the classifier head. The two last layers are the hidden layer and your output layer and the first is the one that communicates with the RNN. The size of this layer is equal to the embedding size multiplied by 3. I’m not sure i understand this fully, and i found the Ulmfit paper quite confusing on the concat pooling part. At the time i write this post, my best understanding is that this size is chosen so that you can have access to the last state of the RNN and the max_pool and mean_pool of this state. Is this correct?
The reason i’m asking this is because i noticed than some of my errors were coming from long pieces of text where the information useful for the classification was at the beginning of the document, and so i felt like the model “forgot” the beginning of the text, which is supposed to be prevented by concat pooling. Do you have similar experiences with long pieces of text, and if you have, do you have any solutions to optimize performances on long pieces of text, or any advice to deal with it?
(I also trained the model backward which performed better overall because most of the sensitive information is usually at the beginning of the state.)
Thank you to the creator for this great library, and thanks in advance for any insight on my issues