Immediately after getting this , I got stuck at cross entropy + softmax.
My chain rule derivative vs analytical derivative seem to be different.
I have written a detailed question here https://math.stackexchange.com/questions/2843505/derivative-of-softmax-without-cross-entropy
Along with the corrected derivation of it along with the question itself.