Train_dl gets target as float instead of long

I’m trying to do multi-label image classification from scratch with se-resnext.

With this code,

data = image_data_from_csv(PATH, TRAINCUT, csv_labels=LABELS50CUT, valid_pct=0.1, sep=’ ‘, size=248, suffix=’.jpg’, bs=16, ds_tfms=([rand_crop()],[crop_pad()]))
def senext_101(notpre): return se_resnext101_32x4d(pretrained = None if notpre else ‘imagenet’)
learn = ConvLearner(data, senext_101, cut=5, ps=0, metrics=accuracy)

I get a working learner… well, almost working. It runs forward fine, but when calculating loss it turns out that train_dl labels are floats, when pytorch expects it to be longs.
image_data_from_csv is using ImageMultiDataset.from_folder , which explicitly creates target self.y with dtype=np.int64.
DataBunch.create() seems to be only taking datasets and passing them to Dataloader.
I tried the same approach for multi-class image classification and it works fine.
So any ideas where to look?

1 Like

Nope, pytorch expects the target to be float… as long as you use the right loss :wink: .
Since you have a multiclassification problem, you shouldn’t use F.cross_entropy as a loss function (which relies on a softmax) but F.binary_cross_entropy_with_logits (which relies on a sigmoid). This one expects floats, which is why we set up the dataloader to send you target as floats.

1 Like

Thanks!

@sgugger BTW why does running from fastai.vision import * puts some 500MB object on GPU?

We’re not sure - if anyone would like to help us, figuring that out would be great! A first step would be to import each thing in fastai/vision/__init__.py separately to see which module is doing this. Then try running the code in that module a bit at a time to see what function is causing it.

All modules in vision cause this bug, and when I moved all the py files up a folder, adjusting the relative paths, they all imported no problems - maddening bug!