ImageDataBunch calling int() for None type

Hi, I have some experience with ML and even some DL, but it’s my first time using fast.ai (other than the course and some toy stuff).

I’m trying to create a ImageDataBunch to train my model, but getting an error I cannot decipher.

def get_label(filename):
    id = os.path.basename(filename)[:-4]
    if id in the_big_dic_of_categories:
        return the_big_dic_of_categories[id]
    else:
        print(f"{filename} is not in dic")
        return -1

data = ImageDataBunch.from_name_func("datasets/train", fnames, label_func=get_label, valid_pct=0.3)

fnames contains the WindowsPath for all the images, which are in “datasets/train” (really some subfolders). All images exists and are right, or at least I have previously modified them with PIL without issue.

The error I’m getting is: TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

First I suspected some image could be corrupted, but checked and standardized all of them with PIL and everything looks fine. I also thought this may be related to some of my labels being quite unbalanced, so some of them may be in the training set but not the validation or something like that, in which case I’m not sure how to solve it with fast.ai

I’m leaving the full traceback here:
TypeError Traceback (most recent call last)
in
12 data = pickle.load(f)
13 else:
—> 14 data = ImageDataBunch.from_name_func(“datasets/train”, fnames, label_func=get_label, valid_pct=0.3)

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\vision\data.py in from_name_func(cls, path, fnames, label_func, valid_pct, **kwargs)
    144         "Create from list of `fnames` in `path` with `label_func`."
    145         src = ImageList(fnames, path=path).split_by_rand_pct(valid_pct)
--> 146         return cls.create_from_ll(src.label_from_func(label_func), **kwargs)
    147 
    148     @classmethod

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in _inner(*args, **kwargs)
    466             self.valid = fv(*args, from_item_lists=True, **kwargs)
    467             self.__class__ = LabelLists
--> 468             self.process()
    469             return self
    470         return _inner

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in process(self)
    520         "Process the inner datasets."
    521         xp,yp = self.get_processors()
--> 522         for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n)
    523         #progress_bar clear the outputs so in some case warnings issued during processing disappear.
    524         for ds in self.lists:

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in process(self, xp, yp, name)
    683     def process(self, xp:PreProcessor=None, yp:PreProcessor=None, name:str=None):
    684         "Launch the processing on `self.x` and `self.y` with `xp` and `yp`."
--> 685         self.y.process(yp)
    686         if getattr(self.y, 'filter_missing_y', False):
    687             filt = array([o is None for o in self.y.items])

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in process(self, processor)
     73         if processor is not None: self.processor = processor
     74         self.processor = listify(self.processor)
---> 75         for p in self.processor: p.process(self)
     76         return self
     77 

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in process(self, ds)
    337         ds.classes = self.classes
    338         ds.c2i = self.c2i
--> 339         super().process(ds)
    340 
    341     def __getstate__(self): return {n:getattr(self,n) for n in self.state_attrs}

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\data_block.py in process(self, ds)
     40     def __init__(self, ds:Collection=None):  self.ref_ds = ds
     41     def process_one(self, item:Any):         return item
---> 42     def process(self, ds:Collection):        ds.items = array([self.process_one(item) for item in ds.items])
     43 
     44 PreProcessors = Union[PreProcessor, Collection[PreProcessor]]

~\AppData\Local\Continuum\miniconda3\envs\kaggle\lib\site-packages\fastai\core.py in array(a, dtype, **kwargs)
    271     if np.int_==np.int32 and dtype is None and is_listy(a) and len(a) and isinstance(a[0],int):
    272         dtype=np.int64
--> 273     return np.array(a, dtype=dtype, **kwargs)
    274 
    275 class EmptyLabel(ItemBase):

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

From what I can see, this is due to some labels being None, as you mentioned probably because they are only present in the validation set. This is weird because it usually can complete with just a warning (and discarding the elements with unknown labels), I’ll see if I can reproduce.

A short-term fix would be to create an unknown class that contain all your classes with very few representatives.

Thanks. I will try that in the meanwhile, it will take some time anyway since I’m now busy with something else :slight_smile:

Hey, any updates on this, I am also encountering a similar error while trying to create an imagedatabunch

I’m having the same issue - I’ve tried to figure out if it is a specific couple of files, because it works sometimes but not others, depending on what files I’m pointing it at.

As OP noted, PIL and windows seem fine with all the pictures - I deleted a few dodgy ones with no extension/ no file size. I noticed if I set the batch size too large, I often get an error that states I have 10 files in my folder, when explorer counts more. But when I get down to 2-3 images in a folder, explorer and the function start counting the same number of files… Is that even related? Driving me nuts. Often too if I delete the model folder from the path, it gets rid of the error.

So I’ve spent some more time digging into my error, and wrote a fresh version of the script, following the fastai docs instead of the example lesson 1 / lesson 2 notebooks. It had the same TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' issue.

The reason for the apparent randomness in when the error arises is tied to np.random used in splitting the data into train and valid. I can use np.random.seed() to set a split that doesn’t cause an error, but this only works for small data sets where the seemingly dodgy files have a low chance of being included wherever it causes an error.

I tried verify_images(), but it didn’t seem to do anything. I’ll have to investigate the use of this a bit more.

Edit: Perhaps it could be from scraping images that were not all jpegs? My scraping script took all images from a website and saved them as .jpg, but if there were a few pngs, gifs, etc in there, they might be causing issues?

Edit 2: It’s disgusting, but you can brute force a random seed search to split it in a way that doesn’t give an error, although I suspect it’s not going to work further down the track…
e.g.

> i=0
> while 1:
>     try:
>         np.random.seed(i)
>         ImageDataBunch.from_name_re(path, fnames, pat=pat, size=224, bs=8)
>     except:
>         i+=1
>         continue
>     else: 
>         print('Seed '+str(i)+' works')
>         break
>     break