@aayushy OpenAI transformer’s it is awesome project to work on count me in if you need a hand. It is second on my list, after I manage to make use of ULMFiT.
The only issue with Transformers is that they train for a month or something like that (I’ve heard that somewhere on Hacker news i haven’t seen this in the paper)
Good to know that clr_beta worked well, and thank you for sharing the detail of what worked. For Polish the thing that had the most importance was the Sentence Piece Vocab size and the number of layers 4 was better than 3 and 5 was worse.
@MicPie Cool. If you want some directions, let me know what you how comfortable you are with fastai, ulmfit python, etc. so I can point you to things that you could best help. Or alternatively, pick some experiments your self and bring back the results and trained models
@MicPie @aayushy I’ve added you both to the repo, there is not much there yet as I’m trying to correct scripts to use BTW17 set, fighting with sentence piece at the moment as it does not accept BOS EOS tokens. Once I have a first LM trained I will publish the changes so that we can start collaborating.
How about we agree on a plan how to progress etc. Here is a proposal feel free to change it:
- a common validation & training set for normal text like Wikipedia
- a common validation & training set for comments as @t-v noticed the language is different for tweet/comment and Wikipedia
- I’m working on the btw17 - 170 MB of comments from Twitter (Should we add sb10k?)
- a script to train a working model for sentiment analysis using sentence piece on the Germeval 2017
The above should give us a base-line then we plan a set of experiments to improve it and work on each experiment separately, sharing intermediate results in github issues and the improved values here.
The perplexity (that I compare to @t-v’s 32) was 38.
@aayushy For the perplexity to make sense we need to know the OOV number and the text you were working on. (If you have a lot of unknowns the perplexity goes down very quickly)