NLP tokenizer returns odd values


I have recently finished the first part of the fastai course and wanted to create a NLP using the ULMFiT model as explained in lesson 3 and 4. I have a (pretty large) set of reviews that have a text and a rating, and I want to create a TextLMDataBunch from the text to train the language model learner on. I run the following code:

 data_lm = (TextLMDataBunch.from_csv(path, 'valid.csv', text_cols='text')

after which I call data_lm.show_batch(). However this returns many numbers and xxunk tokens instead of the text that should be returned. Does anyone know why this is the case?

Fixed this, as TextLMDataBunch already creates a databunch. Either replace it with TextList or just create a databunch without the split_by_rand_pct etc.