Part 1, online study group

gagan · January 5, 2020, 2:24pm

#Meeting Minutes 04-01-2020 (Thanks @msivanes for the inputs)

## Notes

New Participants
Walkthrough of notebook on Yelp Reviews to explore how fine tuning helps in handling Out of Vocabulary(OOV) words in language model by @msivanes. Words that do not appear in the wiki text & very specific to our domain are initialized with random weights & they are learned as part of fine tuning. This was based on learning from notebook[1] created by @pabloc. For more discussion see [2].

## Advice

Use a smaller sample of the dataset before diving into full dataset. This allows for faster training & quicker iteration.

## Questions

How to override the 60000 limit on vocab while creating Language Model?
When we freeze a model for fine-tuning, do the layers become untrainable or the layer-groups?

## Off topic

@gagan is trying to create a language model for assamese language (one of the low resource language).
@msivanes shared about Masakhane[3], group of African researches trying to build translation models for low resource African language
## Resources
[1]Tutorial on SPAM detection using fastai ULMFiT
[2] Adding new word to ULMFit in fine-tuning step
[3] https://www.masakhane.io/