In Chapter 10 NLP “Fine Tuning the Language Model” it shows Jeremy’s learn.fit_one_cycle taking 11minutes.
Mine takes 1hr 45 minutes and I had to downsize the ‘batch size’ = 64 and ‘sequence length’ =40 just to avoid getting CUDA OOM error.
I’m using a paperspace 30G GPU.
I sense that something is wrong. Does anyone know why it’s taking so long or where I could look for tips on speeding it up?
any help at all would be much appreciated