How vocab words are chosen

#1

Hi ~
I’m trying to build a text classifier based on language model.
I’m wondering how words in vocab are chosen? are all words’ frequencies are calculated and then choose the first max_vocab words whose frequency>=min_freq? Or they calculate a word’s frequency, if >=min_freq, put it in vocab, until meet max_vocab, so if some words appear later, even if their frequency>=min_freq, they won’t be in vocab?

0 Likes

(julian) #2

Pretty sure that unless you are using SPProcessor, it does this: https://github.com/fastai/fastai/blob/cdcebcdab8520c790fd90afaa97cbf54013c92c0/fastai/text/transform.py#L148

So it should be picking the most frequent ones.

0 Likes

#3

Got it thank you!

0 Likes