ImageDataBunch.from_df issue in Kaggle

tstclair2009 · May 17, 2020, 7:23pm

Currently going through the Fastai class and thought the Flower disease classifier on Kaggle would be a good challenge.

Currently stuck. Can import the csv into a dataframe and make the needed edits to the labels. When trying to write ImageDataBunch following error is thrown. any help would be greatly appreciated!

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-36-68bb9afd66c9> in <module>
----> 1 data = ImageDataBunch.from_df(path, df, ds_tfms=get_transforms(), size=224, num_workers=4)
      2 data.classes

/opt/conda/lib/python3.7/site-packages/fastai/vision/data.py in from_df(cls, path, df, folder, label_delim, valid_pct, seed, fn_col, label_col, suffix, **kwargs)
    117         src = (ImageList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
    118                 .split_by_rand_pct(valid_pct, seed)
--> 119                 .label_from_df(label_delim=label_delim, cols=label_col))
    120         return cls.create_from_ll(src, **kwargs)
    121 

/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
    477         assert isinstance(fv, Callable)
    478         def _inner(*args, **kwargs):
--> 479             self.train = ft(*args, from_item_lists=True, **kwargs)
    480             assert isinstance(self.train, LabelList)
    481             kwargs['label_cls'] = self.train.y.__class__

/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
    283     def label_from_df(self, cols:IntsOrStrs=1, label_cls:Callable=None, **kwargs):
    284         "Label `self.items` from the values in `cols` in `self.inner_df`."
--> 285         labels = self.inner_df.iloc[:,df_names_to_idx(cols, self.inner_df)]
    286         assert labels.isna().sum().sum() == 0, f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
    287         if is_listy(cols) and len(cols) > 1 and (label_cls is None or label_cls == MultiCategoryList):

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   2065     def _getitem_tuple(self, tup: Tuple):
   2066 
-> 2067         self._has_valid_tuple(tup)
   2068         try:
   2069             return self._getitem_lowerdim(tup)

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    701                 raise IndexingError("Too many indexers")
    702             try:
--> 703                 self._validate_key(k, i)
    704             except ValueError:
    705                 raise ValueError(

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   2007             # check that the key does not exceed the maximum size of the index
   2008             if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 2009                 raise IndexError("positional indexers are out-of-bounds")
   2010         else:
   2011             raise ValueError(f"Can only index by location with a [{self._valid_types}]")

IndexError: positional indexers are out-of-bounds

tstclair2009 · May 18, 2020, 2:44am

so, the solution in case anyone sees this later, is to update all the packages in kaggle.

i used this command:

pip list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1  | xargs -n1 pip install -U

SisengCo · April 8, 2021, 5:54am

Aw thanks @tstclair2009. however I have the same issue but running on my GPU and i just installed the dependencies so none is outdated…still scouring for a solution.