This was exactly the same problem I was having. In my case, reducing the learning rate made the model train. See the other thread where I mention this here:
This was exactly the same problem I was having. In my case, reducing the learning rate made the model train. See the other thread where I mention this here: