Ok, I thought I solved it but didn’t. Let me show exactly how my data looks like and what I want to do.
Each sample of my data can get a label “1” or “0” in one or more of 92 classes. Basically it is a dataframe with text in one column and 92 columns with values “1” and “0”. However, not all samples have a label in all classes. In fact, most of my data consists of samples where only 1 or 2 labels were assigned. For these missing labels, I simply assigned a -1 value.
array([[ 0., 0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1.],
[ 0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1.]])
But now my targets are in a list of size 92, which obviously isn’t the right format:
x,y = first(dls.train)
print(y)
(#92) [tensor([1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 1.,
0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0.],
I have a working prototype in fastai1 and it was quite easy to implement. As I said, I just had to pass a list of columns (or classes) to label.from_df
. But I’m struggling to do the same in V2. Is there any transform to do that? Something like ColReaderS, that accept multiple columns. Maybe there is a easier way to do that but I cant see it.