DataBlock from df: TypeError: object of type 'numpy.int64' has no len()

Hi,

For lesson 1 practice I try to translate a higher level API function to a DataBlock.

The higher level API is written like this:
dls = ImageDataLoaders.from_df(df, path=image_directory, valid_pct=0.2, seed=None, label_col='senior', folder=None, suff='.jpg',bs=64)

Datablock looks like this currently:

block = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                splitter=RandomSubsetSplitter(train_sz=0.2,valid_sz=0.05),
                get_x=ColReader('img_name', pref=str(image_directory)+ os.path.sep, suff='.jpg'),
                get_y=ColReader('senior', label_delim=' '),
                batch_tfms=aug_transforms(size=224))

dls = block.dataloaders(df)

Both functions make use of a Pandas dataframe. The target column (‘senior’) is a Series of -1 and 1, depending on whether the image contains a senior.

I get the following error:


TypeError                                 Traceback (most recent call last)
<ipython-input-20-f79bfb75016c> in <module>
     11                 batch_tfms=aug_transforms(size=224))
     12 
---> 13 dls = block.dataloaders(df)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
     96 
     97     def dataloaders(self, source, path='.', verbose=False, **kwargs):
---> 98         dsets = self.datasets(source)
     99         kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
    100         return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/block.py in datasets(self, source, verbose)
     93         splits = (self.splitter or RandomSplitter())(items)
     94         pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
---> 95         return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
     96 
     97     def dataloaders(self, source, path='.', verbose=False, **kwargs):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    272     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    273         super().__init__(dl_type=dl_type)
--> 274         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    275         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    276 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in <listcomp>(.0)
    272     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    273         super().__init__(dl_type=dl_type)
--> 274         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    275         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    276 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose)
    212         if do_setup:
    213             pv(f"Setting up {self.tfms}", verbose)
--> 214             self.setup(train_setup=train_setup)
    215 
    216     def _new(self, items, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, **kwargs)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in setup(self, train_setup)
    226 
    227     def setup(self, train_setup=True):
--> 228         self.tfms.setup(self, train_setup)
    229         if len(self) != 0:
    230             x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in setup(self, items, train_setup)
    177         tfms = self.fs[:]
    178         self.fs.clear()
--> 179         for t in tfms: self.add(t,items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in add(self, t, items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):
--> 182         t.setup(items, train_setup)
    183         self.fs.append(t)
    184 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in setup(self, items, train_setup)
     76     def setup(self, items=None, train_setup=False):
     77         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 78         return self.setups(getattr(items, 'train', items) if train_setup else items)
     79 
     80     def _call(self, fn, x, split_idx=None, **kwargs):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
     96         if not f: return args[0]
     97         if self.inst is not None: f = MethodType(f, self.inst)
---> 98         return f(*args, **kwargs)
     99 
    100     def __get__(self, inst, owner):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/transforms.py in setups(self, dsets)
    235         if self.vocab is None:
    236             vals = set()
--> 237             for b in dsets: vals = vals.union(set(b))
    238             self.vocab = CategoryMap(list(vals), add_na=self.add_na)
    239 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in <genexpr>(.0)
    218     def _after_item(self, o): return self.tfms(o)
    219     def __repr__(self): return f"{self.__class__.__name__}: {self.items}\ntfms - {self.tfms.fs}"
--> 220     def __iter__(self): return (self[i] for i in range(len(self)))
    221     def show(self, o, **kwargs): return self.tfms.show(o, **kwargs)
    222     def decode(self, o, **kwargs): return self.tfms.decode(o, **kwargs)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in __getitem__(self, idx)
    253         res = super().__getitem__(idx)
    254         if self._after_item is None: return res
--> 255         return self._after_item(res) if is_indexer(idx) else res.map(self._after_item)
    256 
    257 # Cell

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/core.py in _after_item(self, o)
    216     def _new(self, items, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, **kwargs)
    217     def subset(self, i): return self._new(self._get(self.splits[i]), split_idx=i)
--> 218     def _after_item(self, o): return self.tfms(o)
    219     def __repr__(self): return f"{self.__class__.__name__}: {self.items}\ntfms - {self.tfms.fs}"
    220     def __iter__(self): return (self[i] for i in range(len(self)))

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in __call__(self, o)
    183         self.fs.append(t)
    184 
--> 185     def __call__(self, o): return compose_tfms(o, tfms=self.fs, split_idx=self.split_idx)
    186     def __repr__(self): return f"Pipeline: {' -> '.join([f.name for f in self.fs if f.name != 'noop'])}"
    187     def __getitem__(self,i): return self.fs[i]

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in compose_tfms(x, tfms, is_enc, reverse, **kwargs)
    136     for f in tfms:
    137         if not is_enc: f = f.decode
--> 138         x = f(x, **kwargs)
    139     return x
    140 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in __call__(self, x, **kwargs)
     70     @property
     71     def name(self): return getattr(self, '_name', _get_name(self))
---> 72     def __call__(self, x, **kwargs): return self._call('encodes', x, **kwargs)
     73     def decode  (self, x, **kwargs): return self._call('decodes', x, **kwargs)
     74     def __repr__(self): return f'{self.name}: {self.encodes} {self.decodes}'

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in _call(self, fn, x, split_idx, **kwargs)
     80     def _call(self, fn, x, split_idx=None, **kwargs):
     81         if split_idx!=self.split_idx and self.split_idx is not None: return x
---> 82         return self._do_call(getattr(self, fn), x, **kwargs)
     83 
     84     def _do_call(self, f, x, **kwargs):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/transform.py in _do_call(self, f, x, **kwargs)
     84     def _do_call(self, f, x, **kwargs):
     85         if not _is_tuple(x):
---> 86             return x if f is None else retain_type(f(x, **kwargs), x, f.returns_none(x))
     87         res = tuple(self._do_call(f, x_, **kwargs) for x_ in x)
     88         return retain_type(res, x)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastcore/dispatch.py in __call__(self, *args, **kwargs)
     96         if not f: return args[0]
     97         if self.inst is not None: f = MethodType(f, self.inst)
---> 98         return f(*args, **kwargs)
     99 
    100     def __get__(self, inst, owner):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/transforms.py in __call__(self, o, **kwargs)
    188 
    189     def __call__(self, o, **kwargs):
--> 190         if len(self.cols) == 1: return self._do_one(o, self.cols[0])
    191         return L(self._do_one(o, c) for c in self.cols)
    192 

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai2/data/transforms.py in _do_one(self, r, c)
    185         if len(self.pref)==0 and len(self.suff)==0 and self.label_delim is None: return o
    186         if self.label_delim is None: return f'{self.pref}{o}{self.suff}'
--> 187         else: return o.split(self.label_delim) if len(o)>0 else []
    188 
    189     def __call__(self, o, **kwargs):

TypeError: object of type 'numpy.int64' has no len()

I think it has to do with what got passed on to RandomSubsetSplitter. The arguments for this function are correct.

Any hints?

Below an image of the particular DataFrame:

1 Like

First, just an FYI, you want to use RandomSplitter, not RandomSubsetSplitter, the latter takes a random subset of your data and then splits it randomly via it’s results, which isn’t the translation you’re looking for. As to your issues, first try using the proper splitter the tell us what happens :slight_smile: However though you don’t specify or show what the ‘senior’ column looks like in the dataframe. Can you please provide this?

And then finally, you shouldn’t be using a MultiCategoryBlock here but instead a CategoryBlock if you are attempting to do this 1:1. (unless this is a multicategory problem but we don’t know your y’s so it’s hard to tell from this angle)

Hi,

The goal of the DataBlock is to enable myself to use partial data, as was possible in the previous version of fastai. I thought RandomSubsetSplitter was the way to go?

The senior column looks exactly like the other examples in the image: a Series with either -1, or 1 depending on the contents of the image.

I will use the CategoryBlock and try again!

Thanks.

1 Like

In that case the CategoryBlock should be the way to go :slight_smile: Just know that it’s not a 1:1 translation as you’ll be doing different splits (because of the subset) :slight_smile:

I still receive the same error.

The DataBlock has been simplified to at least make it run:

block = DataBlock(blocks=(ImageBlock, CategoryBlock),
                splitter=RandomSubsetSplitter(train_sz=0.2,valid_sz=0.05),
                get_x=ColReader('img_name', pref=str(selfies_image_directory)+os.path.sep,suff='.jpg'),
                get_y=ColReader('senior', label_delim=' '),
                batch_tfms=aug_transforms(size=224))

The error seems to start with dataloaders and then there are many functions that have to do with transforms. The outcome is the same: TypeError: object of type 'numpy.int64' has no len()

The dataset is a folder full of selfies, with a seperate textfile that I read into a Pandas DataFrame.

Can you provide the output of:

block.summary(df)

It should tell us exactly where the issue is happening

1 Like

Here you go :slight_smile:

The data is from the CRVC | Center for Research in Computer Vision: ‘selfies’

I think I found the problem: had to remove label_delim=' '.
Now calling dls = block.dataloaders(df) doesn’t return an error :slight_smile: