I got a decent accuracy in the first epoch itself [word error rate
0.33], which didn’t improve further on training for 2nd, 3rd … epoch.
I want to understand this behaviour.
- Why would error rate jump to 0.33 in the first epoch itself? And,
- Why wouldn’t it decrease further? Is this generally because of high learning rate?
Should the learning rate be reduced in the 2nd epoch for model to learn further from data?