I’m working with an academic dataset which has negative labeled observations in the training, validation and test set. Negative label meaning the absence of any class, i.e a label of [0,0,0]
for a problem with three classes. I’m not sure how to do add these observations to the training and validation set using the data_block
api.
I’ve tried setting the negative observations to an empty string but that results in the empty string label added to the classes, which is not what i’m looking for as I don’t want to predict a separate class for the absence of any class.
For example in the lesson3-planets.ipynb
if I set tags of cloudy to an empty string:
df.loc[df.tags == 'cloudy', 'tags'] = ''
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
np.random.seed(42)
src = (ImageList.from_df(df, path, folder='train-jpg', suffix='.jpg')
.split_by_rand_pct(0.2)
.label_from_df(label_delim=' '))
data = (src.transform(tfms, size=128)
.databunch().normalize(imagenet_stats))
arch = models.resnet18
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2)
learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
print(learn.data.classes)
['',
'agriculture',
'artisinal_mine',
'bare_ground',
'blooming',
'blow_down',
'clear',
'conventional_mine',
'cultivation',
'habitation',
'haze',
'partly_cloudy',
'primary',
'road',
'selective_logging',
'slash_burn',
'water']
I’ve also tried setting the label to None
which results in this error:
~/code/plaquebox-classifier/venv/lib/python3.6/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
281 labels = self.inner_df.iloc[:,df_names_to_idx(cols, self.inner_df)]
282 # import pdb; pdb.set_trace();
--> 283 assert labels.isna().sum().sum() == 0, f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
284 if is_listy(cols) and len(cols) > 1 and (label_cls is None or label_cls == MultiCategoryList):
285 new_kwargs,label_cls = dict(one_hot=True, classes= cols),MultiCategoryList
AssertionError: You have NaN values in column(s) 1 of your dataframe, please fix it.