Chapter 10: how to modify the DataBlock so that it loads from a .csv file?

BorutF · May 7, 2021, 9:40pm

Normally I loaded data using:

dls = TextDataLoaders.from_csv(path=path, csv_fname='training_LM.csv', text_col='Tweet', valid_pct=0.1)

The TextDataLoaders can do it from csv or folder or df.

However in the 10th chapter there is the DataBlock

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

I am not sure exactly how to modify this to load from a csv. It looks especially confusing with the get_items.

ali_baba · May 7, 2021, 11:28pm

Hey @BorutF !
Take a look at TextBlock.from_df (Text data | fastai) – this should outline what your looking for in regards to this example.

In general, when trying to extract the data from a dataframe, you’ll want to use the get_x method vs the get_item – this distinction can certainly be confusing, but after a few times you’ll get the hang of it!

BorutF · May 8, 2021, 5:55am

Thanks.

So get_x refers to a column in the data_frame I guess, while the get_items is to do with the files?

It works anyway I only had to change the splitter, because I had no column “is_valid”.

BorutF · May 8, 2021, 6:04am

It appears to be a lot faster than the TextDataLoaders.from_csv, also later in training the model, however that could be due to google colab processing capacity availability.