# Must soft-max layer when using cross-entropy?

Generally, when training multiple-classification model, we use cross-entropy as loss function.
The input of cross-entropy is:

• the label
• model prediction (model output)

In my opinion, in terms of mathematical theory, the model prediction value should be converted to probability. So there must be a soft-max layer at the end of the model.

My question is:

1. Is my understanding that should be converted to probability correct?
2. Must there is a soft-max layer at the end of the model?
Can we train the model if there is no soft-max layer in terms of mathematical theory?