Problem with loss function for multi-label classification with sparse target matrix

Merlinvt · January 24, 2019, 3:27pm

Hey i’ve been using the new fastai v1 library without having taken the course v3 yet.

Something is not working for me if i try to use the BCELoss function.
To give some context:
I am trying to classify tags for banking transactions in German. The text might be something like
“Rent January 2019”. I than want to see what tags apply. Like recurring, household, rent, credit, …
I can have more than one tag for each transaction. The tags are my targets and they are one-hot encoded.
There are 36 different tags. Most of the Time 0-4 tags apply for a transaction.

I have some issues with the BCELoss function.

The following code is working for me:

    learn = text_classifier_learner(data_clas,metrics=[accuracy_thresh,f_score,precision,recall])
    learn.loss_func = nn.BCEWithLogitsLoss()

The predictions are all between 0 and 1.

if i try to change the loss function:

    learn.loss_func = nn.BCELoss()

I get the following error massage:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-71-3ea49add0339> in <module>
----> 1 learn.fit_one_cycle(1, 1e-2)

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     23 
     24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    172         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    173         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 174             callbacks=self.callbacks+callbacks)
    175 
    176     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     94     except Exception as e:
     95         exception = e
---> 96         raise e
     97     finally: cb_handler.on_train_end(exception)
     98 

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     84             for xb,yb in progress_bar(data.train_dl, parent=pbar):
     85                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 86                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     87                 if cb_handler.on_batch_end(loss): break
     88 

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     21 
     22     if not loss_func: return to_detach(out), yb[0].detach()
---> 23     loss = loss_func(out, *yb)
     24 
     25     if opt is not None:

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    502     @weak_script_method
    503     def forward(self, input, target):
--> 504         return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
    505 
    506 

~/anaconda3/envs/kontoml/lib/python3.7/site-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
   2025 
   2026     return torch._C._nn.binary_cross_entropy(
-> 2027         input, target, weight, reduction_enum)
   2028 
   2029 

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.294379 at /opt/conda/conda-bld/pytorch-cpu_1544218667092/work/aten/src/THNN/generic/BCECriterion.c:62

If i debug the error and look whats in input i can see that the values are not between 0 and 1.

tensor([[ 2.0003e-01,  5.1642e-02,  5.1950e-01,  ...,  8.7474e-01,
         -2.8382e-01, -6.0670e-01],
        [-2.6214e-01,  5.6214e-01, -3.6147e-01,  ..., -9.9762e-01,
         -2.8733e-01,  1.8939e-01],
        [-2.0356e-01, -6.9230e-02, -8.2497e-01,  ...,  6.2485e-01,
         -7.7243e-01,  1.1368e-02],
        ...,
        [-4.1342e-01,  3.4119e-01,  7.2984e-02,  ..., -5.0797e-01,
         -2.1738e-01,  3.2233e-01],
        [ 2.1739e-01,  4.6416e-01, -4.4133e-01,  ...,  2.4930e-02,
          1.0479e-01, -9.0387e-02],
        [ 5.6815e-01, -1.0539e-01, -1.2246e-01,  ...,  6.1257e-01,
         -9.7907e-02, -7.5804e-04]], grad_fn=<AddmmBackward>)

Merlinvt · January 24, 2019, 3:37pm

I am not even sure if i am using the right loss function here … The reason i chose binary cross-entropy loss was, that i have a sparse multiclass target matrix. When i tried it with nn.BCEWithLogitsLoss() i got a high accuracy and but 0 precision and 0 recall. When i looked into the predictions they were all 0.

Merlinvt · January 24, 2019, 4:15pm

I realy don’t understand this. I am using the right Loss function with BCE am i not ? When i run the classifier on only a 100 samples i get
train_loss valid_loss accuracy_thresh fbeta precision recall
0.719672 0.695731 0.423611 0.069510 0.038095 0.487500

On 10000 Samples i get:

train_loss valid_loss accuracy_thresh fbeta precision recall
0.115259 0.124552 0.971088 0.000000 0.000000 0.000000

And all my predictions are 0. That must mean that my loss function isn’t incentivized to classify anything different than 0. But isn’t that what BCELoss or BCEWithLogitsLoss() is for ?

Merlinvt · January 24, 2019, 7:33pm

Just in case i’ve done something else wrong. Here are the other important parts of my code.

I am doing classification by charakter not by word.

class LetterTokenizer(BaseTokenizer):
    "Basic class for a tokenizer function."
    def __init__(self, lang): pass
    def tokenizer(self, t:str) -> List[str]:
        out = []
        i = 0
        while i < len(t):
            if t[i:].startswith(BOS):
                out.append(BOS)
                i += len(BOS)
            else:
                out.append(t[i])
                i += 1
        return out
            
    def add_special_cases(self, toks:Collection[str]): pass 

import string
all_letters = string.ascii_letters + " .,;'" + unique_charakters
vocab=Vocab.create(all_letters, max_vocab=1000, min_freq=0)

tokenizer=Tokenizer(LetterTokenizer, pre_rules=[], post_rules=[])

 data_clas = TextClasDataBunch.from_df(path='', train_df=train_df, valid_df=valid_df,
                         tokenizer=tokenizer, vocab=vocab,
                         mark_fields=False,text_cols='text',label_cols=label_cols,classes=tags_to_predict)

f_score = partial(fbeta, thresh=0.5, beta = 1)

learn = text_classifier_learner(data_clas,metrics=[accuracy_thresh,fbeta,f_score,recall,percision])
´´´

Merlinvt · January 25, 2019, 11:39am

Found a solution for my Loss function.
I ended up using the BCEWithLogitsLoss function,
but i used a weight the positive labels higher.

learn.loss_func  = torch.nn.BCEWithLogitsLoss(pos_weight=torch.tensor(10.))

These two links where helpfull for me.