Understand the learn.fit results

I got a decent accuracy in the first epoch itself [word error rate 0.33], which didn’t improve further on training for 2nd, 3rd … epoch.

I want to understand this behaviour.

  1. Why would error rate jump to 0.33 in the first epoch itself? And,
  2. Why wouldn’t it decrease further? Is this generally because of high learning rate?

Should the learning rate be reduced in the 2nd epoch for model to learn further from data?