Changing/reloading new data with callback on_epoch_begin

ChoJin · January 9, 2020, 6:34pm

Hello,

I’m trying to update my code from earlier version of fastai to fastai v1
I’m using a callback on_epoch_begin to reload a new set of training data at the beginning of each epoch (I’m generating new random patches from my data).

I’m a little bit lost with all the different classes.
I can easily generate a new LabelList for the training data with my current code. Now I’m trying to figure out what code I should write to replace/change the train_dl from the databunch within the callback.

Can anyone shed some light please?

sgugger · January 9, 2020, 10:07pm

If you do a LearnerCallback, you will have a reference to the Learner. You can then change learn.data or learn.data.train_dl as you wish.

ChoJin · January 9, 2020, 11:00pm

indeed, but my question is what’s the easiest and most elegant way to actually update learn.data.train_dl, keeping pretty much everything the same, changing just the data, when i have a LabelList in my hand?

For instance, assuming trn is my LabelList
self.learn.data.train_dl.dl = DataLoader(trn, batch_size=batch_size)

doesnt work because the dataloader doesnt contain tensors but Item-related object

sgugger · January 9, 2020, 11:31pm

Just use the same way you used to create your initial databunch, but with your change. I’m sorry can’t be more helpful but it’s impossible to talk abstractly without seeing code.

ChoJin · January 11, 2020, 1:04pm

hello,

I ended up doing something quite hackish imho (trn being my newly created training LabelList):

db = LabelLists(path, trn, trn).databunch(bs=batch_size, collate_fn = my_collate_fn)
self.learn.data.train_dl.dl = db.train_dl

I’m calling databunch() to get automatic conversion from Item to tensors, and I can’t call it on unsplit List (but since I dont need to replace the validation data, only the training one, I only create a LabelList for the training data).

is there a less ugly way of doing that?

sgugger · January 11, 2020, 2:28pm

If you just change the train labellist, did you try learn.data.train_ds = trn?
Might complain with a cannot set attribute error in which case it should be learn.data.train_dl.dataset = trn.

ChoJin · January 11, 2020, 8:51pm

Indeed the first one throw a “can’t set attribute” error

doing learn.data.train_dl.dataset = trn doesn’t complain but it doesn’t seem to actually reflect the changes (it just assign it to dict within the DeviceDataLoader).

I tried learn.data.train_dl.dl.dataset = trn but I got the following error message:
dataset attribute should not be set after DataLoader is initialized

sgugger · January 12, 2020, 3:41am

Ugh, thank you PyTorch… Not sure the new method of a DataLoader can change the dataset in v1. It can in v2, but that does not help your problem

ChoJin · January 12, 2020, 9:38am

I guess my ugly hackish way will do for now ^^

ChoJin · January 25, 2020, 9:40pm

Hello, me again

I’m trying to use mix-precision with my network, and it works perfectly fine until I add my reloading on_epoch_begin callback mentioned above, which modifies the dataloader with the following line in the callback:
self.learn.data.train_dl = db.train_dl

calling fit_one_cycle(), I’m getting the following error message:

1084 return torch.prelu(input, weight)
1085
1086
RuntimeError: expected scalar type Float but found Half

(and I dont get this error if I dont change switch learn.data.train_dl on the fly)

I can see that the to_fp16() is adding a transformer for the data, I can also see in fit() that “cb_handler.set_dl(learn.data.train_dl)” is called before “cb_handler.on_epoch_begin()”, but i’m still not sure whether or not that’s the main issue.

I’m digging into the code, but if someone has a clue, meanwhile, that’d be helpful.

bwarner · January 26, 2020, 4:44pm

Fastai’s mixed precision mode expects the data to have the batch_to_half transformer to convert the databunch’s Float tensors to Half tensors. After changing the databunch you’ll need to add batch_to_half to the learner’s new databunch.

ChoJin · January 27, 2020, 8:47am

I’ll give it a try, but the batch_to_half is added to the DeviceDataLoader (learn.data, which adds it to learn.data.tfms through add_tfm()) and I’m switching the DataLoader (learn.data.train_dl), so I would expect this tfm to be still applied to learn.data?

I’ll double check though.

bwarner · January 30, 2020, 5:14pm

Looks like each DataLoader has its own batch_to_half transform applied to it after to_fp16. After your change in dataloader, learn.data.train_dl.tfms won’t have batch_to_half, but you can add it via learn.data.train_dl.add_tfm.

Goodie · May 9, 2022, 9:27am

Hi everyone,

I am trying to do the same as ChoJin but I am kind of lost in how to do it. I will try to post pieces of my code to be more precise. My aim using this callback is to change the training set in each epoch.
As I am dealing with a extremely unbalanced dataset, I thought training portions of it, with these extracts being balanced 50 - 50, will give me better results if I were able to change the training set every epoch in order to, at the end, give to the model the whole dataset.

class CallbackHandler():
    def __init__(self,cbs=None):
        self.cbs = cbs if cbs else []
    
    def begin_epoch(self):
        learn.model.train()
        ### Get a new input
        roq_data = get_dfs(roq_df, WINDOW=3, TRAIN_FLAGS = 130, TEST_FLAGS = 65)        
        kpi_roq_df = roq_data['kpis']['df']  # Gets the df
        kpi_roq_splits = roq_data['kpis']['splits']  # Gets the train/valid splits
        ### Get a new DLS
        to = TabularPandas(kpi_roq_df , cont_names=kpi_roq_cont, y_names='WARNING', splits=kpi_roq_splits)
        trn_dl = TabDataLoader(to.train, bs=128, shuffle=False, drop_last=True)
        val_dl = TabDataLoader(to.valid, bs=64)
        self.learn.dls.train.xs = trn_dl.xs
        self.learn.dls.train.ys = trn_dl.ys
        self.learn.dls.valid.xs = val_dl.xs
        self.learn.dls.valid.yxs = val_dl.ys
        
        return True

And for the fit fun. I was trying to do sth like:

def one_batch(xb, yb):
    loss = learn.loss_func(learn.model(xb), yb)
    loss.backward()
    learn.opt.step()
    learn.opt.zero_grad()

def all_batches(dl):
    for xb,yb in dl:
        one_batch(xb, yb, cb)

def fit(epochs, learn, cbs):
    cb_handler = CallbackHandler(cbs)
    for epoch in range(epochs):
        if not begin_epoch(learn): continue
        all_batches(learn.data.train_dl, cb) 
        with torch.no_grad(): all_batches(learn.data.valid_dl)

The problem is I’m not understanding well how to modify the learn parameters. Appreciate any kind of help!