Normally I loaded data using:
dls = TextDataLoaders.from_csv(path=path, csv_fname='training_LM.csv', text_col='Tweet', valid_pct=0.1)
The TextDataLoaders can do it from csv or folder or df.
However in the 10th chapter there is the DataBlock
dls_lm = DataBlock(
).dataloaders(path, path=path, bs=128, seq_len=80)
I am not sure exactly how to modify this to load from a csv. It looks especially confusing with the get_items.
Hey @BorutF !
Take a look at TextBlock.from_df (Text data | fastai) – this should outline what your looking for in regards to this example.
In general, when trying to extract the data from a dataframe, you’ll want to use the get_x method vs the get_item – this distinction can certainly be confusing, but after a few times you’ll get the hang of it!
So get_x refers to a column in the data_frame I guess, while the get_items is to do with the files?
It works anyway I only had to change the splitter, because I had no column “is_valid”.
It appears to be a lot faster than the TextDataLoaders.from_csv, also later in training the model, however that could be due to google colab processing capacity availability.