Neural Network getting symmetric probabilities for every epoch in

I have created a Neural Network from scratch with loss functions as cross entropy loss. I am getting the nan in the predictions each epoch in each batch after few epochs. I have initialized the weights randomly, does anyone has any idea what could be the issue?
I have verified my implementation of Softmax using the in-build softmax function, but I am still facing the issue.

Most probably, according to me, the issue is caused by the backpropagation of cross-entropy loss with respect to Soft-max, can anyone guide me from where can I find the code for backpropagation of cross-entropy loss.

What are all those warnings? It looks like your parameters are disappearing and/or exploding.

I think your problem is further back.

Hi Karan,

a naive softmax implementation (exp(x) / exp(x).sum()) is not numerically stable: exp(x) can easily overflow or underflow. Test your implementation with x=[1000.0] to see if that’s the case. To fix the issue, subtract max(x) from x before computation:

import numpy as np

def naive_softmax(x):
    s = np.exp(x)
    return s / s.sum()

def stable_softmax(x):
    return naive_softmax(x - max(x))
>>> x = np.array([1000.0])
>>> naive_softmax(x)
__main__:2: RuntimeWarning: overflow encountered in exp
__main__:3: RuntimeWarning: invalid value encountered in true_divide
>>> naive_softmax(-x)
>>> stable_softmax(x)
>>> stable_softmax(-x)

Additionally, softmax and cross-entropy can be fused into single layer to further improve stability of the computations.

Hi @mkardas,

I am using the in-build softmax, now the first warning which I am getting is related to the backpropagation of cross-entropy loss, with softmax.

Hi @joedockrill ,

I am building a simple NN, and the first warning which I was getting was related to Softmax, after using the in-build function, the next warning which I am getting is related to the back propagation of cross-entropy loss, with SoftMax.

Is the loss improving? It could be simply that your learning rate is too high and the training diverges.

Hi @mkardas,
I have reduced the learning rate as a result of which I can run around 20 epochs, but after that, I am getting the run time error.

The Graph of Accuracy and Loss is as follows:

Seems like the accuracy is also not improving.

The magnitude of your loss is very high, it looks like you’re computing a sum over samples instead of taking an average.

Hi @mkardas,

I think my implementation of backpropagation has some issues, can you please help.

The architecture and formulas are posted at: Help with backpropagation Implementation

You can use something like to verify your gradients.