How to attach data/dls after load_learner()

After loading my learner with

learn = load_learner(‘model.pkl’)

I want to train it further. If I immediately try to fit_one_cycle, I get an error since my learn.dls is empty. This makes sense, it’s not going to save my whole dataset when I export the model. My question is how do I ‘attach’ my original dataset to my dataloaders object in my learner so that I can train it further?

Presumably, I don’t need to re-define my entire dls? Or do I? (my data is a dataframe of image file names and labels) Thanks in advance for any help!

You’d need to build new DataLoaders through some form (DataBlock, API, etc), and then you’d just do:

learn.dls = new_dls

Thanks but I have a question:

Can I not extract the original dataloaders from the learner? For example, I can do learn.dls.test_dl(new_data) to apply the same transforms and normalizations on a test data set, surely this means the dataloaders is stored in the learner object?

My dataset has changed since I first trained the model, but I want to keep the original normalize values (from when I first trained the model), which I cannot do if I have to redefine a new dataloaders object with new data. Do you see the problem?

In fact, would something like

learn.dls = learn.dls.test_dl(new_data)

work?

How to build the DataLoaders are (IE they’re blank dataloaders)

Not quite, as test_dl builds a single DataLoader. But we can take on that idea:

new_train_dl = learn.dls.test_dl(new_train_data, with_labels=True, shuffle=True, drop_last=True)
new_valid_dl = learn.dls.test_dl(new_valid_data, with_labels=True)
new_dls = DataLoaders(new_train_dl, new_valid_dl)
learn.dls = new_dls

Very important thing to note here:

We are assuming that the labels are stored in the exact same way as they were grabbed via training. IE if the labels are grabbed via parent_label, then they must be that way here as well

Not 100% sure this will work perfectly, I’m pretty sure it should as all transforms are in both dataloaders, and which are applied and how is based on split_idx, so make sure to play with it and verify. As test_dl builds a new validation DataLoader specifically

2 Likes

Yes, this makes sense. Thanks a lot for taking the time to help!