Hey everyone, i am pretty new to fastai and hope someone can help me with a callback.
I have a dataset that is just a bit to small so i would like to be able to tune the factor importance some more under certain conditions. I would like to add a penalty to the loss, and this penalty i would like to calculate based on the predictions of the current model during training.
I made an example using the adult.csv dataset. Basically what i would like is to be able to run a callback after the loss is calculated which takes the model and the data from the current batch and then varies a factor which i believe should have no influence under certain conditions. In this example (that i now made up) i would like to say that if the workclass is ‘Self-emp’, the column hours-per-week should be of no influence.
What i would like the callback to do
In the callback i would look up the range of the values in hours-per-week and predict on that range for rows in which the workclass is Self-emp. Then i take all the predictions and calculate the variance, which is what i add to the loss (later this should probably get multiplied by some weight which should be another hyperparameter).
from fastai.tabular.all import * path = untar_data(URLs.ADULT_SAMPLE) df = pd.read_csv(path/'adult.csv') df['workclass'].unique() dls = TabularDataLoaders.from_df(df,# path, procs=[Categorify, FillMissing, Normalize], cat_names=['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'], cont_names=['age', 'fnlwgt', 'education-num', 'hours-per-week'], y_names='salary', valid_idx=list(range(1024,1260)), bs=64) class AddCustomPenalty(Callback): def after_loss(self): xd = self.dls.train.decode(self.xb).decode() f_name = 'hours-per-week' # just an example f_min = xd[f_name].min() f_max = xd[f_name].max() for i in range(len(xd)): row = xd.items.iloc[i].to_dict() if row['workclass'] in [' Self-emp-inc', ' Self-emp-not-inc', ' Without-pay']: row[f_name] = np.linspace(f_min, f_max, num=15) df = pd.DataFrame.from_dict(row) dl = dls.test_dl(df) preds, targs = learn.get_preds(dl=dl) var_penalty = torch.var(torch.stack(preds), dim=0) else: var_penalty = 0 self.loss = self.loss + var_penalty learn = tabular_learner(dls, layers=[200,100], metrics=accuracy, cbs=AddCustomPenalty()) learn.fit_one_cycle(3)
This unfortunately creates a very long traceback with errors that i do not know how to fix, any help would be very welcome.