Language Model Zoo 🦍

piotr.czapla · November 4, 2018, 4:40pm

@sgugger I know you must be super busy but could you create a French thread and describe what you achieved there so far? (btw. make the first message wiki)

piotr.czapla · November 4, 2018, 4:42pm

@nandobr @monilouise @saulberardo, guys have you managed to do something with Portuguese? If so can you create a thread and describe what is the dataset you were testing your models on and where area you with tests?

piotr.czapla · November 4, 2018, 4:51pm

@shoof, @Moody where are you regarding Chinese, would you mind starting a thread so that we can join the development?

piotr.czapla · November 4, 2018, 4:56pm

@mollerhoj have you tried to run the ULMFiT against any classification task?

shoof · November 4, 2018, 7:44pm

Absolutely! Delighted to have you join the development on Chinese! I’ve read your paper and learned quite a lot of things regarding sentencepiece

NandoBr · November 5, 2018, 1:59am

Hi @sgugger , I want to train Fastai v1 LM in portuguese. Can you please tell me where I can set language = “pt” ? Spacy version is 2.0.16 or 2.0.17? I saw you spent around 1 hour/epoch. Which vm were you using ?

piotr.czapla · November 5, 2018, 8:05am

Cool! Do you want to create Portuguese thread?

NandoBr · November 5, 2018, 11:21am

Done.

piotr.czapla · November 6, 2018, 9:47am

I’m sure many of you heard that multilingual BERT is out, which is a competing solution to ULMFiT (https://twitter.com/seb_ruder/status/1059439373396123649). Let see how this two compare. I guess BERT being new and large will be better. But given how big and slow it is ULMFiT may still be a better first choice for practical tasks, but we need to compare the two to be able to make an informative decision. I think we can do the work in the Language dependant threads and report back the findings here for anyone that is interested but can’t help.

tomsthom · November 6, 2018, 10:08am

Yes great idea. I will try the BERT model in french and compare with my ULMFiT results both in performance and time to train / do inference.

piotr.czapla · November 6, 2018, 10:20am

Awesome! I’ve added your thread (ULMFiT - French) to the wiki above for anyone that would like to join.
BTW. According to BERT readme, google collab has free TPU access so we can use that to fine tune the classifier.

piotr.czapla · November 6, 2018, 10:24am

Thank you for adding the thread to the wiki above :). For anyone that wants to join the work on Portuguese and play with the BERT as well feel free to join us here: ULMFit - Portuguese

piotr.czapla · November 6, 2018, 12:38pm

Can you make the Language thread then so that ppl can join in and participate? Please share what you found regarding the datasets and if you managed to train the model.

ertan · November 9, 2018, 11:58pm

Here is the thread for Turkish: ULMFiT - Turkish

ertan · November 10, 2018, 12:14am

Has there been any implementation/experimentation with Transformer architectures in fastai?

harikrishnanrajeev · November 17, 2018, 8:00am

Hi @jamsheer , is there a thread for ULMFit - Malayalam ?.

piotr.czapla · November 19, 2018, 5:10am

I don’t know about any experiments yet. but i know there are few ppl interested to give it a try.

tomashm · November 22, 2018, 10:13am

Has there been any progress on Norwegian?

sarnthil · November 22, 2018, 5:04pm

Hi everyone. I’d like to get ULMFit for Romanian. Anyone else working on it?

Virgil · November 23, 2018, 10:10am

Hi @sarnthil . We’re just starting work on it at the Timisoara study group. I have a lot of GCP credits that will expire soon so I plan to start training the language model on a wikipedia dump this weekend.

The plan is to find Romanian language classification datasets to fine-tune & apply the ULMFit LM to. Finding good datasets for it might be hardest task :). Did you find some already ?