ImageDataBunch Error

Hi, Am trying to do this Kaggle competition

In this as Mr. Jermey mentioned in the 1st video, am trying to create the ImageDataBunch object from the training data. The problem here is there are 2073 catagory of whales which has only one training image, 1285 with 2 images, 568 with 3, 278 with 4 and so on. i.e the no of catafories decreases with increase in training samples. I received the error when doing ImageDataBunch.from_csv(); stating that " Your validation data contains a label that isn’t present in the training set, please fix your data. ". when i gave val_split as 0.1. So i flipped the whale catagories which has only one training sample and added it with the original data and updated the csv and stored in the working directory under the folder ‘…/working/train’ and ‘…/working/train.csv’. Now even when i run the image data bunch again Am getting the same error.

As mentioned in this post


I cant leave the samples with less data bcoz there are so many of them. So can you suggest the solution for this ?

And can you plz tell me how to get the top 5 predctions as output i.e top 5 whale catagory predction probablities for a whale image since that is what asked as output in that question. i.e MAP5 with 5 catagories.

Here is the full error report :

KeyError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process_one(self, item)
281 def process_one(self,item):
–> 282 try: return self.c2i[item] if item is not None else None
283 except:

KeyError: ‘23ec5a1a0.jpg’

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in ()
2 examine(csvPath = csvPath, trainPath = trainPath);
3 help(ImageDataBunch.from_csv);
----> 4 Data = ImageDataBunch.from_csv(path = ‘…/input/’, folder = trainPath, csv_labels = csvPath, valid_pct = 0.1);

/opt/conda/lib/python3.6/site-packages/fastai/vision/data.py in from_csv(cls, path, folder, sep, csv_labels, valid_pct, fn_col, label_col, suffix, header, **kwargs)
134 df = pd.read_csv(path/csv_labels, header=header)
135 return cls.from_df(path, df, folder=folder, sep=sep, valid_pct=valid_pct,
–> 136 fn_col=fn_col, label_col=label_col, suffix=suffix, **kwargs)
137
138 @classmethod

/opt/conda/lib/python3.6/site-packages/fastai/vision/data.py in from_df(cls, path, df, folder, sep, valid_pct, fn_col, label_col, suffix, **kwargs)
123 src = (ImageItemList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
124 .random_split_by_pct(valid_pct)
–> 125 .label_from_df(sep=sep, cols=label_col))
126 return cls.create_from_ll(src, **kwargs)
127

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
412 self.valid = fv(*args, **kwargs)
413 self.class = LabelLists
–> 414 self.process()
415 return self
416 return _inner

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self)
459 “Process the inner datasets.”
460 xp,yp = self.get_processors()
–> 461 for i,ds in enumerate(self.lists): ds.process(xp, yp, filter_missing_y=i==0)
462 return self
463

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self, xp, yp, filter_missing_y)
584 def process(self, xp:PreProcessor=None, yp:PreProcessor=None, filter_missing_y:bool=False):
585 “Launch the processing on self.x and self.y with xp and yp.”
–> 586 self.y.process(yp)
587 if filter_missing_y and (getattr(self.x, ‘filter_missing_y’, None)):
588 filt = array([o is None for o in self.y])

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self, processor)
66 if processor is not None: self.processor = processor
67 self.processor = listify(self.processor)
—> 68 for p in self.processor: p.process(self)
69 return self
70

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self, ds)
288 ds.classes = self.classes
289 ds.c2i = self.c2i
–> 290 super().process(ds)
291
292 def getstate(self): return {‘classes’:self.classes}

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process(self, ds)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
—> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in (.0)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
—> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process_one(self, item)
282 try: return self.c2i[item] if item is not None else None
283 except:
–> 284 raise Exception(“Your validation data contains a label that isn’t present in the training set, please fix your data.”)
285
286 def process(self, ds):

Exception: Your validation data contains a label that isn’t present in the training set, please fix your data.

Not sure if it will help you but here a link to a repo which is doing the same kaggle competition.

It’s looks very nice and will probally give you some pointer as to where to go next.

Thanks. I’ll look into it. But I think if i can know how to make this error disappear, i can do that in all competitions.