Oversampling / class weights for tabular data

Hi
I’m using:

learn = tabular_learner(dls, model_dir=“/tmp/model/”, metrics=[accuracy], layers = [100,200]).to_fp16()
learn.fit_one_cycle(n_epoch = 35, lr_max = 0.01, cbs=[SaveModelCallback(monitor=‘accuracy’, comp=None, min_delta=0.0,fname=‘best’, every_epoch=False, at_end=False,with_opt=False, reset_on_fit=True)])

To train tabular model (on fastai V 2.7.12)

My database is unbalanced, how can I apply oversampling / class weights during model training?
Thanks
Moran

Class weights would be set in your loss function. Here is an example of applying class weights for 4 classes:

class_weights=torch.FloatTensor([1.,64.,8.,1.]).cuda()
loss_fn = CrossEntropyLossFlat(axis=1, weight=class_weights)

A simple way to oversample more rare classes is just to duplicate them. For tabular data I would think this wouldn’t really affect the storage size meaningfully.

1 Like

Thanks a lot for your response
I get the following error message

You named your loss function variable loss_fn when you created it but loss_func when you are passing it to your learner constructor.

1 Like

Or, alternatively, use the weighted data loader (search for it in the docs)

Thanks all!