Language Model Zoo 🦍


(Vladislav) #310

Hi everyone!
I have started to train an AWD LSTM model using v1 of fastai. While I was completely fascinated by the ease of use (it took, like, 5 lines of code to get started) and flexibility of the framework, I have been running into technical problems. I mostly use default parameters, only tweaking Adam’s betas and the learning rate, my corpus is 110 million tokens split 90/10 into train/validation. The first epoch goes on mostly fine, though memory utilization of GPU is around 99% from start, but when I start another epoch, I get Cuda OOM error. This prevents me from using cyclical learning rates. Sometimes I get OOM at the end of the first epoch. Cutting down on bptt leads to slower convergence (and probably worse outcome).
Did anyone have this problem and found a solution? My setup is a deep learning image on GCP with K80 (12Gb).


(Ertan Dogrultan) #311

I was just looking into doing this for Turkish. Glad to have found this thread.


(Thomas Chambon) #312

You can lower the batch size. The training will be longer, but with the good size, you should not have memory errors.


(Piotr Czapla) #313

OOM at the end of the first epoch may be caused by too large batch size. Validation does not use sampled softmax so it has larger memory requirement. In original ULMFiT scripts the batch size was cut 5 times for validation

    trn_dl = LanguageModelLoader(trn_lm, bs, bptt, batch_sets=batch_sets)
    val_dl = LanguageModelLoader(val_lm, bs//5 if sampled else bs, bptt, batch_sets=1)

Because of that.


(Piotr Czapla) #314

@anz9990
Awesome work! Do you know what is the STOA for Japanese IMDB?

If you are above or near STOA can you share your weights and point me the the publication that shows what is the current stoa on imdb?

It would be awesome if you would update the wiki thread above with your findings by start a new thread like ULMFiT - Japanese and putting your SOTA and your results in a table. Here is an example


(Asir Saeed) #315

Thanks! All I really did was use the lesson notebook and make a few changes for my dataset :wink:
There isn’t a Japanese IMDB dataset, the one I used was from Yahoo Movie Reviews that someone kindly put on their repository.

This is the latest publication I’ve found on Japanese sentiment classification. I guess since I’m around 90-91% I’m pretty close to SOTA but its really not comparable because they are using a different dataset.

The datasets they are using for benchmarks are available for download after being subject to an application review by the research organization that curates them. And they only accept applications if you are a researcher at a University/Research Institution. Since I’m not I can’t get access to it.

I can share the weights for this model and the ja-wiki language model that I trained for transfer learning if that’s useful.


(Piotr Czapla) #316

@cstorm125 can you create a thread for Thai results and post them in the wiki thread above similar to other languages that are either done or are work in progress? So that ppl can quickly check where we are and join on the languages where the work is still on going?


(Charin) #317

Done


(Piotr Czapla) #318

I had a look, the request form is quite long ! We have the same in polish there is one org. that has the largest dataset of polish sentences but they don’t give access to ppl because of legal stuff. Fortunately, they offered to run our models on their data and maybe publish weights, which is good enough for me.

Maybe you can try to drop them an email? I would do it for you but I’m afraid that given the fact that the whole website is in Japanse they won’t appreciate English :).

But either way, create ULMFit - Japanse and post your work there we can improve upon it.


(Ari) #319

Maybe try reducing your batch size ? I think the default is 64, maybe try 48 or 32.


(Hafidz Zulkifli) #320

Had already asked this in the Malay-focused thread, but re-posting it here for a wider audience.

If there are no local research being done on top of a fixed dataset/corpus (ie IMDB), how does one actually establish that our results are “state of the art”?


(Piotr Czapla) #321

I have similar issue with sentiment analysis for Polish and I’m going to compare it against it self: model without pre-training on wikipedia and with pre-training, and against cloud services that are available for Polish. I think this should be good enough and I’m working on getting a proper sentiment analysis dataset for polish with guys that organised poleval.


(Piotr Czapla) #322

@lesscomfortable can you create Spanish thread and get a summary of your implementation of ULMFiT?


(Piotr Czapla) #323

@sgugger I know you must be super busy but could you create a French thread and describe what you achieved there so far? (btw. make the first message wiki)


(Piotr Czapla) #324

@nandobr @monilouise @saulberardo, guys have you managed to do something with Portuguese? If so can you create a thread and describe what is the dataset you were testing your models on and where area you with tests?


(Piotr Czapla) #325

@shoof, @Moody where are you regarding Chinese, would you mind starting a thread so that we can join the development?


(Piotr Czapla) #326

@mollerhoj have you tried to run the ULMFiT against any classification task?


(Xu Fei) #327

Absolutely! Delighted to have you join the development on Chinese! I’ve read your paper and learned quite a lot of things regarding sentencepiece :slight_smile:


(Fernando Melo) #328

Hi @sgugger , I want to train Fastai v1 LM in portuguese. Can you please tell me where I can set language = “pt” ? Spacy version is 2.0.16 or 2.0.17? I saw you spent around 1 hour/epoch. Which vm were you using ?


A simple request
(Piotr Czapla) #329

Cool! Do you want to create Portuguese thread?