I’ve seen various blog posts and a few posts on this forum about this topic but none have answered my question. I am doing multilabel classification on tabular data.
Here is what I have. train
is the training data (800 columns) and train_targets
are the labels (206 columns, all values are either 0 or 1):
cat_names = ['cat1', 'cat2', 'cat3']
cont_names = [x for x in train.columns if x not in cat_names]
train_label_col = []
for i, row in enumerate(train_labels.itertuples()):
vals = [','.join(str(ele).split()) for ele in row[1:]]
train_label_col.append(' '.join(vals))
train['label'] = train_label_col
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(train))
to = TabularPandas(train, procs, cat_names, cont_names, y_names="label", y_block=MultiCategoryBlock(), splits=splits)
All of the above works fine, but when I run
dls = to.dataloaders(bs=1024)
I get the “Could not do one pass in your dataloader, there is something wrong in it” warning, and when I run dls.show_batch(3)
it throws TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
When I run learn = tabular_learner(dls, y_range=(0,1), layers=[500, 250], n_out=1, loss_func=F.binary_cross_entropy)
it works, but learn.fit_one_cycle(5, 1e-2)
throws the same error as above.
Any help is greatly appreciated