Generally, BOS indicates the “Beginning Of Sentence” and EOS for the “End Of Sentence”.
But I found the docs says what these special tokens mean:
UNK
(xxunk) is for an unknown word (one that isn’t present in the current vocabulary)BOS
(xxbos) represents the beginning of a text in your datasetFLD
(xxfld) is used if you setmark_fields=True
in yourTokenizeProcessor
to separate the different fields of texts (if your texts are loaded from several columns in a dataframe)TK_MAJ
(xxmaj) is used to indicate the next word begins with a capital in the original textTK_UP
(xxup) is used to indicate the next word is written in all caps in the original textTK_REP
(xxrep) is used to indicate the next character is repeated n times in the original text (usage xxrep n {char})TK_WREP
(xxwrep) is used to indicate the next word is repeated n times in the original text (usage xxwrep n {word})