LogSoftmax vs Softmax!

What is an advantage or where it is appropriate to use the logsoftmax instead of softmax, I know logsoftmax is log(softmax) but softmax is meant to define the portion of the individual value in a group. But result of logsoftmax is not the same(ie the total of it is not 1 like the softmax). So where it exactly adds value.

image

Here is the source code of log_softmax in the PyTorch repo

def log_softmax(input, dim=None, _stacklevel=3):
    r"""Applies a softmax followed by a logarithm.
    While mathematically equivalent to log(softmax(x)), doing these two
    operations separately is slower, and numerically unstable. This function
    uses an alternative formulation to compute the output and gradient correctly.
    See :class:`~torch.nn.LogSoftmax` for more details.
    Arguments:
        input (Variable): input
        dim (int): A dimension along which log_softmax will be computed.
    """

You can also read more about the LogSumExp trick here.

5 Likes