Hi,
I wonder how the <unk>
tokens are assigned when creating a torchtext field because I don’t actually see a vocabulary size argument in data.Field
(I don’t think it’s created by default).
Thanks
Hi,
I wonder how the <unk>
tokens are assigned when creating a torchtext field because I don’t actually see a vocabulary size argument in data.Field
(I don’t think it’s created by default).
Thanks
I have figured it out myself: add either max_size
or min_freq
when calling LanguageModelData.from_text_files
(or LanguageModelData.from_dataframes
). It inherits from the Vocab
class in torchtext.