Get_items=get_image_files, how they match the input files merely?

Danrohn · January 18, 2022, 5:36am

Hey guys,
I’m trying to follow this guide:

And I wanted to understand this step:

Now let’s generate our dataset!
items = get_image_files(path_hr)
parallel(Crappifier(path_lr, path_hr), items);
Let’s take a look at one of our generated images:
bad_im = get_image_files(path_lr)

So he got those items and bad_im variables, but he didn’t use them in the:

dblock = DataBlock(blocks=(ImageBlock, ImageBlock),
                   get_items=get_image_files,
                   get_y = lambda x: path_hr/x.name,
                   splitter=RandomSplitter(),
                   item_tfms=Resize(224),
                   batch_tfms=[*aug_transforms(max_zoom=2.),
                               Normalize.from_stats(*imagenet_stats)])

I expected something like:

dblock = DataBlock(blocks=(ImageBlock, ImageBlock),
                   get_items = items, # HERE
                   get_y = bad_im, # HERE
                   splitter=RandomSplitter(),
                   item_tfms=Resize(224),
                   batch_tfms=[*aug_transforms(max_zoom=2.),
                               Normalize.from_stats(*imagenet_stats)])

How get_items=get_image_files knows to only pick the “input” files and match them to the target files?

Thanks

dhoa · January 18, 2022, 2:58pm

Take a look at the code block just below dblock:

def get_dls(bs:int, size:int):
  "Generates two `GAN` DataLoaders"
  dblock = DataBlock(blocks=(ImageBlock, ImageBlock),
                   get_items=get_image_files,
                   get_y = lambda x: path_hr/x.name,
                   splitter=RandomSplitter(),
                   item_tfms=Resize(size),
                   batch_tfms=[*aug_transforms(max_zoom=2.),
                               Normalize.from_stats(*imagenet_stats)])
  dls = dblock.dataloaders(path_lr, bs=bs, path=path)
  dls.c = 3 # For 3 channel image
  return dls

he calls dblock.dataloaders with path_lr (mean path for the low resolution images). So your x is the low resolution images. Then the y image is the high resolution version of the x image by calling get_y = lambda x: path_hr/x.name (path_hr : path of high resolution image)

Is it clear for you now ?

Hope that helps

Danrohn · January 18, 2022, 5:40pm

Yes! Thank you.
So what is the difference between source and path in the arguments here?

dls = dblock.dataloaders(path_lr, bs=bs, path=path)

dhoa · January 19, 2022, 11:15pm

If I remember correctly, path_lr - which you mean source here - is where the path of your images located. path is where you want you model will be saved once finished.

This part I think the documentation should be more clear .

Hope that helps,

Danrohn · January 20, 2022, 12:36am

Where exactly can I find the most elaborated function’s implementation? I found some in the main site of it didn’t show all of those functions, arguments or else.
For instance, for “show_batch” I can press “View source” and it shows me its function in the package, but it didn’t show “figsize()” which I only found somewhere else

dhoa · January 20, 2022, 3:44pm

Because show_batch use matplotlib. You can see in the definition of show_batch this: **kwargs .This mean anything else others than the parameters defined in the function will be save in kwargs.

This pattern is very handy when you have a long list of parameters. However, you loose the readability. And unfortunately, matplotlib depends heavily on this syntax, that you can not easily know which parameters can be used.

Hope that helps

Danrohn · January 29, 2022, 8:43am

Thank you for your answer!