Structured Learner


#82

Have a quick and dirty example notebook created a few weeks just to show a working example. I did not actually see @johnri99 notebook(did I miss it?) which I am sure is superior but here is one for viewing. Also, had trouble with the flag, is_reg, thus the nb.


(Anders) #83

Update
I fixed it: Added .cuda()

mixedinputmodel = MixedInputModel(emb_szs, len(df.columns)-len(cat_vars),
                   0.04, 1, [1000,500], [0.001,0.01], y_range=y_range).cuda()

Original:

I’m getting this error as well. Did you find a solution?

image

Based on Rossman data:

mixedinputmodel = MixedInputModel(emb_szs, len(df.columns)-len(cat_vars), 0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)

bm = BasicModel(mixedinputmodel, 'mixedInputRegression')

md = ColumnarModelData.from_data_frame(PATH, val_idx, df, yl.astype(np.float32), cat_flds=cat_vars, bs=128, test_df=df_test)

learn = StructuredLearner(md, bm)

learn.lr_find()

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/_functions/thnn/sparse.py in forward(cls, ctx, indices, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
55
56 if indices.dim() == 1:
—> 57 output = torch.index_select(weight, 0, indices)
58 else:
59 output = torch.index_select(weight, 0, indices.view(-1))

TypeError: torch.index_select received an invalid combination of arguments - got (torch.FloatTensor, int, torch.cuda.LongTensor), but expected (torch.FloatTensor source, int dim, torch.LongTensor index)


(Anurag ) #85

Do we have to explicitly convert categorical variables as categories before passing them to our model? This part occupies a lot of memory. I believe we do that to calculate the emb_szs primarily. Instead, we can calculate emb_szs without converting it to categorical data as well. But I am not sure how pytorch will treat these variables if we did not convert them to categorical variables. Has anyone tried this before?


(Michael) #86

I’ve been reading through this thread and it seems the answer is somehow contained within, however I still have had no luck with getting a binary classification working with a structured data set.

I am trying to identify a rare event (about 1% occurrence) in a time series data set. To make the dataset more balanced I’ve stripped back most of the non-occurrences so that its roughly 50/50 for the event occurring however my training set is now only 70,000 rows.

So my prediction should either be true or false. I’m using is_reg=False

 md = ColumnarModelData.from_data_frame(PATH, val_idx, df, y1, is_reg=False, cat_flds=cat_vars, bs=64)

I initially tried setting my dep var to

 y=[0, 0, 0, 1, 0, 0, 0, 1, 0....]

and out_sz = 1

 m = md.get_learner(emb_szs, n_cont = len(df.columns)-len(cat_vars),
               emb_drop = 0.04, out_sz = 1, szs = [1000, 500], drops = [0.001,0.01], use_bn = True)

but this throws

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCTensorCopy.c:20

I then tried 0ne hot encoding my dep and set out_sz = 2

 y = [[0, 1], [0, 1], [1, 0], [0, 1] ..... ]
 m = md.get_learner(emb_szs, n_cont = len(df.columns)-len(cat_vars),
               emb_drop = 0.04, out_sz = 2, szs = [1000, 500], drops = [0.001,0.01], use_bn = True)

however that throws an error on

 df ,y, nas, mapper = proc_df(model_samp, 'pred_label', do_scale=True)
 AttributeError: Can only use .cat accessor with a 'category' dtype

I’ve tried setting my one hot encodings to both int64 and category types but both throw this same error.

I’m a little stumped on the correct way to set this up. I’m clearly missing something though.


#87

I think you want y = [0, 1, 1, 0, 1] format and out_sz = 2. I also had to do y.astype(‘int’) in ColumnarModelData, so you might see if that helps.

For imbalance, I duplicated the rarer targets:
dfx = df[df[‘RESOLUTION’] < 10]
dfy = df[df[‘REJECTED’] < 10]
frames = [df, dfy, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx, dfx]
df = pd.concat(frames)
And shuffled afterwards.

My multiclass target combines rejected and resolution later in the data processing.


(Dan Goldner) #88

Does this example help?


(Michael) #89

Wow, I don’t know how I managed to miss that combination. Thank you. Just to confirm my understanding of how fastai/pytorch is evaluating this model’s loss.

With
is_reg = False
y=[1, 0, 0, 0, 1]
out_sz=2

The last layer of the model is outputting a rank 1 tensor shape [2], one for class 1 and one for class 2. Fast.ai (because is_reg=False) is setting the loss function to log_softmax

//column_data.py
if not self.is_reg:
  if self.is_multi:
    x = F.sigmoid(x)
  else:
    x = F.log_softmax(x)

which is taking those two values, and scaling them to [0, 1] and adding to 1 and then taking the Log

 e.g softmax([3.4, 9.8]) -> [0.2575, 0.7424]
 log([3.4, 9.8]) -> [-1.3567, -0.2973]

So if the target value was y = 1 then in the above example is it treating 1 is the i=1 index into an array of 2 classes? So does it then compare -0.2973 to 1?? or does it take the log of 1 (the target) and compare -0.2973 to ln(1) which would give a loss of 0.2973?

I just want to make sure I really understand what the loss calculation is actually telling me. Given the rarity of the event I’m trying to predict I want to make sure that I maintain a low false positive rate and a high accuracy on the true positive probability even at the sacrifice of increasing the false negative rate. i.e I’d rather miss 9/10 events so long as the one time it does predict an event I can have assurance that it is highly likely to occur.


(Michael) #90

Thanks, this example is helpful. You mention at the end that perhaps this is not the best use of a neural network. What approach would you have used instead?


(Quan Tran) #91

You can define a custom loss function. In this case I used negative log loss:

def imbalanced_loss(inp,targ):
    return F.nll_loss(inp,targ,weight=T([.01,.99]))

learn = StructuredLearner(...)
learn.crit = imbalanced_loss

I have tested this out with the imbalanced TalkingData dataset from Kaggle (notebook). The results are not bad!


(John Richmond) #92

Interesting that you get good results with this approach. I have tried using weighted loss functions with the Kaggle credit card fraud dataset (which is extremely unbalanced) and found that depending upon the weighting it either predicts far too many fraudulent transitions or non at all. I guess it might be possible to find an optimum but I couldn’t do it and had to look at other approaches (in my case use of an autoencoder).

For information I also tried oversampling the smaller class and undersampling the larger class to give even class numbers, neither approach worked especially well but the over sampling seemed better - intuitively I think this is because you throw away less data

Going back to the question raised, I think it is slightly confusing that Pytorch avoids the one hot encoding for the input to the loss function, but then the output has one column per class. I guess its pretty obvious why but can be easy to get confused.

Regards

John


(Will Hampson) #93

UPDATE: Turns out the system did not hang, but instead for some reason after the epoch was complete took another ~ hour to spit out the output.

Original:
Thank you for the excellent notebook. Always helpful to learn by example.

When running your notebook, the processing hangs when I fit the model. Any idea what’s happening here? I watched it work it’s way through the epoch and then after it finished instead of providing output it just hangs. Low memory? I’m running on a 64gb system. Let me know if i can help better define the problem on my end


(Martin) #94

Have you tried data augmentation only on the smaller class to make it bigger? Perhaps that might also have nice results.


(John Richmond) #95

Yes that was one of the approaches I tried. That produced the best results but still nowhere near as good as the Autoencoder. The issue is the tradeoff between precision and recall. Over-sampling the small class to give the same overall numbers for both classes tends to result in catching almost all of the minority class (the fraudulent transactions) but also classifying many other transactions as fraudulent. To me predicting genuine transactions as fraud is not quite as bad as missing a genuine fraudulent activity but its not too much worse since it is likely to result in customer frustration.

At the same time you can also adjust the weights of the two classes to juggle between precision and recall but its difficult to get both to a good value.