Custom loss for decision making

etremblay · April 9, 2022, 7:14pm

Hey guys,

Getting back to learning neural networks recently.

I am trying to make a neural network that has to take a decision. At any given hour, I want to predict if the blue number is going to be greater than the orange number:

If the network predict correctly the sign of the difference, it reaps a reward equal to the difference. If it is wrong though, it would get punished by a penalty equal to the difference. For example if orange = 5 and blue = 10 and you predict Positive, you would get +5$ as a reward since blue - orange == 5. If you predicted Negative though you would get -5$. Plus there’s fixed costs for doing a prediction, 0.2$ for a positive prediction, 1.2$ for a negative prediction.

I would also like the network to be able to decide to do nothing if it is not certain enough. The cost of being wrong is pretty steep and also each time you do a prediction there’s a fixed cost. So ideally I would prefer not to do anything if the network is not sure enough.

The network must choose between three choices:

Do nothing
Bet negative
Bet positive

I want to make a custom loss function that take this into account.

I have a tabular model that outputs 3 outputs. I pass that to softmax, which gives me a tensor of probabilities the model assign to each choice…

Here is my first crack at it:

class ProfitLoss(Module):    
    def forward(self, out, targs):
        # out is a tensor [batch, 3] where [:,0] == do nothing, [:,1] == Bet negative, [:,2] == Bet Positive
        # we do softmax on it, which returns probability for each choice that sums to 1
        out = torch.softmax(out, dim=1)
        
        # targs is the difference between orange and blue and it is the target in my dataloader
        # we multiply it by [0,-1,1], so if target == 500, then targs becomes [0, -500, 500]
        targs = targs * torch.tensor([0, -1, 1], device='cuda')
        
        # we multiply it by the probabilities, let's say [0.2, 0.6, 0.2] * [0, -500, 500] = [0,-300, 100]
        loss = out * targs
        
        # We add our fixed cost per prediction, 1.2$ for negative, 0.2$ for positive and 0$ for doing nothing
        loss += torch.tensor([0, -1.2, -0.2], device='cuda')
        
        # I sum on the first axis, [0, -300, 100] = [-200] so I would lose 200$ here
        loss = loss.sum(axis=1)
        
        # We make this negative because we are calculating gains, but we want to minimize the loss
        return -torch.mean(loss)

How could I make this better? My gut feeling is that multiplying it by the tensor [0, -1, 1] probably cause gradient problems in the first position because we are multiplying by zero…

Any pointers to literature that could be interesting to learn more or any tips would be appreciated!

Thanks a lot!

ElisonSherton · June 4, 2022, 11:28am

Hi @etremblay

Really great first formulation of the loss. I would like to know if you were further able to find any literature or do some research related to this.

I would think one thing which could help is normalization of the targets (i.e. blue - orange) to be in the range of [0,1] or standardization might help since using values with O(2) magnitude might cause problems in backpropagation…

Thanks,
Vinayak.

etremblay · June 7, 2022, 6:28pm

I was not really able to figure out this problem, but found some relevant literature to penalize the neural network more if it predicts something in the wrong sign direction:

Implementation in PyTorch: GitHub - JDE65/CustomLoss: Custom Loss functions for asset return prediction with deep learning regression

etremblay · August 9, 2022, 1:27pm

For anyone interested I found papers related to this. I didn’t know but the right term to search google is “learning to abstain”. When the network is uncertain, it sends a signal (using various mechanism depending on the paper) that it it too uncertain to predict.

Deep Gamblers: You only change the loss function and add an other option to the softmax logits which corresponds to "don’t predict’.
Self-Adaptive Training: It learns to re-weights samples with noisy labels which can also be used for learning to abstain to predict
Stop Overcomplicating Selective Classification: Use Max-Logit: A more recent paper which improves on Deep Gamblers and Self-Adaptive Training

ReisMacand · May 21, 2024, 6:49pm

One way to mitigate this could be to use a differentiable approximation of the step function, like the sigmoid or tanh function, to map your predictions to probabilities. This could help smooth out the gradients and improve training stability. Also, consider experimenting with different cost structures and weighting schemes to see what works best for your specific problem. As for literature, “Deep Learning” by Goodfellow et al. is a great resource for diving deeper into neural network architectures and loss functions. Keep tinkering and iterating, and you’ll find the right balance! And hey, if you’re ever stuck, you can always roll a d20 and let randomness inspire your next move.