Translating some old fastai1 code to fastai2

theshop · May 10, 2021, 8:28pm

Hi friends!

Essentially I have a folder with images and a csv of the format “fn_col, label_col” and want to train an image classification model. In the old fastai v1 world i used to have a simple data setup work flow that used to work as follows-

src = (ImageList.from_csv(path, csv_name)
    .split_by_rand_pct()
    .label_from_df())
data = (src.transform(get_transforms(), size=SZ)
        .databunch(bs=BS).normalize(imagenet_stats))

I’m trying to replicate that pattern in v2 with the following

df = pd.read_csv('labels.csv')
data = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                 get_items=get_image_files, 
                 splitter=RandomSplitter(0.1),
                 get_x=lambda x:x[0],
                 get_y=lambda x:x[1],
                 item_tfms=Resize(128),
                 batch_tfms=aug_transforms())
dls = data.dataloaders(df.values)

I get the following error-

TypeError: expected str, bytes or os.PathLike object, not numpy.ndarray

I have a feeling I’m doing multiple things wrong here, pretty sure my labels, resizing and transforms are all done incorrectly. Would really appreciate any pointers here!

theshop · May 10, 2021, 9:26pm

I think I figured it out! Would really appreciate if someone can confirm that this is the best way to do this

normalize_tfm = Normalize.from_stats(*imagenet_stats)
src = DataBlock(
blocks=(ImageBlock, CategoryBlock()),
getters=[ColReader(‘fn_col’,pref=path),ColReader(‘label_col’)],
splitter=RandomSplitter(valid_pct=0.15, seed=42),
batch_tfms=normalize_tfm)

dls = src.dataloaders(df, path,batch_size=BS image_size=SZ)

muellerzr · May 10, 2021, 9:31pm

Only thing you’re missing is the aug_transform. Do this in the batch_tfms of your DataBlock:

batch_tfms = [aug_transforms(), Normalize.from_stats(*imagenet_stats)]

theshop · May 10, 2021, 9:36pm

Thanks for taking a look! Hmm when i add the aug_tfms i get

Could not do one pass in your dataloader, there is something wrong in it

muellerzr · May 10, 2021, 9:43pm

You probably need to put that Resize back on the item_tfms too (on mobile, missed that the first time)

Also when you get that, do a dblock.summary(df)

Or:

pip install fastdebug

from fastdebug import *

And then re run your code again. (You don’t need to do summary here)

theshop · May 10, 2021, 9:48pm

Cool so now i get the following error when running

dls = src.dataloaders(df, path,batch_size=BS image_size=SZ)

Error

NameError Traceback (most recent call last)
in ()
----> 1 dls = data.dataloaders(df, path,batch_size=16, image_size=128)

10 frames
/usr/local/lib/python3.7/dist-packages/fastdebug/fastai/datasets.py in (.0)
59 t = getattr(self, ‘types’, [])
60 if t is None or len(t) == 0: raise Exception(“The stored dataset contains no items and self.types has not been setup yet”)
—> 61 types = L(t if is_listy(t) else [t] for t in self.types).concat().unique()
62 self.pretty_types = ‘\n’.join([f’ - {t}’ for t in types])

NameError: name ‘is_listy’ is not defined

Didn’t expect a not defined error for sure!

muellerzr · May 10, 2021, 9:52pm

Sorry! That’s an issue on my side, let me fix that real quick

Go ahead and do pip install git+https://github.com/muellerzr/fastdebug, you should get a more verbose error.

(pip install fastdebug -U might also work, not sure how fast pip propagates on a release )

theshop · May 10, 2021, 9:54pm

Here’s the new trace

Could not do one pass in your DataLoader, there is something wrong in it. Please see the stack trace below:

TypeError Traceback (most recent call last)
in ()
----> 1 dls = data.dataloaders(df, path,batch_size=16, image_size=128)

15 frames
/usr/local/lib/python3.7/dist-packages/fastcore/dispatch.py in call(self, *args, **kwargs)
116 elif self.inst is not None: f = MethodType(f, self.inst)
117 elif self.owner is not None: f = MethodType(f, self.owner)
→ 118 return f(*args, **kwargs)
119
120 def get(self, inst, owner):

TypeError: There was an issue calling the encodes on transform Transform:

‘list’ object is not callable

muellerzr · May 10, 2021, 9:57pm

that makes more sense. We can’t have a list of lists! We need to do:

batch_tfms = [*aug_transforms(), Normalize.from_stats(*imagenet_stats)]

theshop · May 10, 2021, 9:58pm

That did it! Thank you so much

theshop · May 10, 2021, 10:02pm

One last question, since i had to put the resize step back into the first datablock API call, that means every time I want to try out a different size of image and batch size, I’ll need to instantiate the whole block as opposed to just the data loader, is that correct? It’s not a big deal! Just wondering if there’s an easy way to pull the resizing into the dataloader creration step?

muellerzr · May 10, 2021, 10:08pm

You should be able to override:

dls.train.after_item

and dls.valid.after_item

The transform at index [0] should be resize so just do:

dls.train.after_item[0] = Resize(224)

theshop · May 10, 2021, 10:09pm

Great! Thanks again!