Must soft-max layer when using cross-entropy?

Generally, when training multiple-classification model, we use cross-entropy as loss function.
The input of cross-entropy is:

  • the label
  • model prediction (model output)

In my opinion, in terms of mathematical theory, the model prediction value should be converted to probability. So there must be a soft-max layer at the end of the model.

My question is:

  1. Is my understanding that should be converted to probability correct?
  2. Must there is a soft-max layer at the end of the model?
    Can we train the model if there is no soft-max layer in terms of mathematical theory?