Trying to test Curriculum Learning, getting CUDA error

Blanche · June 29, 2019, 9:40pm

I’m playing around with PlantClf2016 dataset, my models has trouble differentiating between some of the classes, so I’ve decided to give it a go.
So I’m first training the model on classes except the one that we’re hard to differ, then I save the model.
Then I’d like to train on the whole data set, but I’m getting an tensor size mismatch (duh, because second dataset has more classes).

RuntimeError: Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([934, 512]) from checkpoint, the shape in current model is torch.Size([1000, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([934]) from checkpoint, the shape in current model is torch.Size([1000]).

Code (exluded classes is a list of classnames, didn’t include because it’s visually big):

size = x - (x % 7)

np.random.seed(42)
seed = 42
pct = 0.05
bs = 64
classes_excluding = list(set(data.classes) - set(excluded_classes))
src = (ImageList.from_folder(path_img)
        #.use_partial_data(pct, seed) #comment out for learning
        .split_by_rand_pct()
        .label_from_func(get_label_from_xml, classes=classes_excluding)
        .add_test_folder())
data = (src
        .transform(tfms, size=size)
        .databunch(bs=bs*2)
        .normalize(imagenet_stats))

learn = to_fp16(cnn_learner(data, arch, metrics=error_rate, ps=0.1))
learn.fit_one_cycle(2)
learn.save('cl-stage-1')

np.random.seed(42)
bs = 64
src = (ImageList.from_folder(path_img)
        #.use_partial_data(pct, seed) #comment out for learning
        .split_by_rand_pct()
        .label_from_func(get_label_from_xml)
        .add_test_folder())
data = (src
        .transform(tfms, size=size)
        .databunch(bs=bs*2)
        .normalize(imagenet_stats))
learn = to_fp16(cnn_learner(data, arch, metrics=error_rate, ps=0.1))
learn.load('cl-stage-1')

What can I do to train this way in fastai (first training on most of the classes, then second on all)?

TomB · June 30, 2019, 10:17am

Edit: Missed exactly what you were doing.
As @ste says, you should reserve room in your model for extra classes. You can also then probably avoid calling cnn_learner again and instead just update the data within the learner (or construct a basic Learner with model and new data). The model from your first training should be identical to your second to be able to load weights so cnn_learner is not needed the second time.

ste · June 30, 2019, 10:37am

Create a list of all_classes and use it in the classes parameter of both calls to labels_from_func. You’re still passing different data to the two steps but reserve room for future classes on your model (data.c should be the same in the two passes, otherwise you’ve to tweak your weights, reserving room for new classes…)

Blanche · June 30, 2019, 11:01am

So pass all classes, but don’t pass the data for the excluded classes? I’ve got the idea in the first place from that post.
Is there any way to easily filter out files with given classes from ds?
I’ve tried something like this, but I still can’t overwrite train_ds in databubunch
train_ds = [item for item in data.train_ds if str(item[1]) not in excluded_classes]

TomB · June 30, 2019, 12:46pm

The databunch will have already constructed a data loader form your dataset, just overwriting the dataset won’t change the items used by the dataloader.

There’s ItemList.filter_by_func you should be able to use with the get_label_from_xml function. Something like:

all_items = ImageList.from_folder(path_img)
easy_items = ImageList.from_folder(path_img).filter_by_func(lambda x: get_label_from_xml(x) not in excluded_classes)

(Note that filter_by_func is in-place so you need to apply it to a new ImageList)
Construct a DataBunch from both sets of items (eplit, label, making sure to use all classes for the partial set as well, normalise etc). So now you have easy_data and all_data databunches. Now use cnn_learner(easy_data, arch) to create the learner, train it for a bit, here using just easy data. Then rather than constructing a new cnn_learner instead do learn.data = all_data as in that post. Then train some more, now with all data but the same model.
You might also want to try unfreezing after training on easy_data for a bit, training some more, then switch to all_data and freeze again. And repeat the train-unfreeze-train on all_data.

ste · June 30, 2019, 12:59pm

Exactly.