Everytime I launch learn.predict("…") function it gets different prediction. Is it right? I thought that in preiction mode there is no any dropouts, so the result should be determinated…
We don’t take an argmax in the predict function, we sample from the probability distribution we get when we run a softmax on the logits.
And what the reason for this? Could you give any sources where it’s already explained?
In your Jupyter notebook, run
If you read the source code that they show, my previous statement will be verified.
As for why it is done, well there are a few ways of generating text. Taking an argmax at every time step is a greedy search, what learn.predict does is described to some extent starting from minute 15 in the following video.