Hi, I’m trying to make a language model (so I can transfer the encoder to a classifier later), but when I create the DataLoaders and look at the batches, it looks like it is really filled with xxunks. I don’t have that much data (~1800 samples), so I’m wondering whether this is normal, or if I’m just doing something wrong.
Making the DataLoader
lm_dls = DataBlock(
blocks=TextBlock.from_df(lmdf, is_lm=True),
splitter=RandomSplitter(0.15)
).dataloaders(lmdf)
Data sample