Lesson 2: DataBlocks question

saragee3 · September 14, 2020, 8:51pm

These questions are from the “From Data to DataLoaders” section.

bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

After instantiating DataBlock and assigning it to bear, we run:

dls = bears.dataloaders(path)
print(os.getcwd()) ///notebooks/course-v4/nbs
print(path) // .
dls.valid.show_batch(max_n=4, nrows=1)

How does the dataloader know to look in the /bears dir and not /images? Same question for the teddy example:

bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=4, nrows=1, unique=True)

Each time I run these lines of code, it always shows a teddy. Why would it do that if path is not absolute (/bears/teddy)?

hansenms · September 14, 2020, 9:56pm

I am not an expert here, but when you configure your DataBlock, you tell it how to get all the items with the get_items= argument. You tell it to use the get_image_files function which is part of fastai, it will take a given path and find all the files (recursively) under that path that are images. If you try it out:

fns = get_image_files(path)
fns

You will see that it is an L type object (a fancy list) that has a Path entry for each image file under that folder and the DataBlock will use that to create the DataLoaders when you you do:

dls = bears.dataloaders(path)

You also tell it to use:

get_y=parent_label

so use the parent (of the file) directory to get the label.

Hope this helps.

saragee3 · September 15, 2020, 1:18am

that makes sense! i see now that get_image_files has recurse=True.