Obscure behaviour: Learner modifies DataLoaders

I noticed that if we pass to a Learner object a DataLoader with no proper normalization (i.e. if we are using a pre-trained resnet we should use imagenet stats normalization), the learner apparently modify (I say apparently because I didn’t find it on the library code but I ran an example to check it) the proccessing behaviour of the data loader.

from fastbook import *
from fastai.vision.all import *

path = untar_data(URLs.PETS)

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(lambda x : 'cat' if x[0].isupper() else 'dog', 'name'),
                 item_tfms=Resize(224),
                 batch_tfms=[])
dls = pets.dataloaders(path/"images")

def gen_sample():
    t = tensor( Image.open('./cat.jpg').resize((224,224)) )
    return dls.test_dl([t]).one_batch()[0].squeeze(0).cpu()

l1 = gen_sample()

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1)

l2 = gen_sample()

After running the code above

torch.allclose(l1,l2)

Returns ‘False’

normalize = Normalize.from_stats(*imagenet_stats)
torch.allclose(normalize(l1.cuda()).cpu(),l2)

Returns ‘True’

Even when it’s nice that the library prevent us doing something wrong (as using a pre-trained resnet without imagenet normalization), I think it would be better to throw an error and explain that fact instead of silently correct the mistake.
This behavious is also weird becuase when I pass a DataLoader object to a Learner I understand is for ‘using’ it, not for eventually modifying its behaviour.

What do you think?

When you create the cnn_learner you can use

learn = cnn_learner(dls, resnet18, metrics=error_rate, normalize=False)

to disable normalizing.

I think it’s fine the way it is, as normalize is available as an argument in the function signature.

1 Like

Thanks! I thought it were something about Learner but I just read it in cnn_learner docs.

The problem came to me when I was migrating the model to java and wanted to be sure to replicate the test preprocessing. Is there a way to visualize the preprocessing path and see the transformations that are composed? (something analogous to print the model)

That’s a good idea to build a visualization of preprocessing steps, but nothing like that exists AFAIK. I’d check out the transforms that you are using by looking at

dls.after_item
dls.before_batch
and dls.after_batch to see what transforms you’re applying.

See chapter 11 of the book for more information.

1 Like

Great avice! This chapter is very good to understand how Transforms works! Thanks!

I think it would be great a “full stack” visualization. I mean a visualization of what happens when someone call .predict from a Learner with a sample.

It’s necesary to know everything that is happening, for example:

  • When migrating to other language. An example of case of use would be when you migrate to Pytorch Mobile.
  • When comparing with other implementation. An example of this case could be when you see some experiment in a papper and want to replicate in fastai.

I think it’s Preprocessing (item and batch) + Pytorch model + Activation (I think activation is separeted from learn.model).

For visualizing models, you can try netron