Why training become slower and slower

I train nn on jupyter notebook,
at the beginning it takes 1 min per epoch,
but the time per epoch increase little by little.
after 10 hours will become 2min per epoch.

That is why?
I search on google, found that many person have this problem.
And many person thought that is jupyter notebook’s problem.
How to do with it??

1 Like
  • What Platform are you using right now ( Windows or Linux ) ?
  • Local Machine or Cloud ?
  • What is the Course ?
  • What Notebook ?
  • What fastai Version ?
  • Running in CPU environment or GPU environment ?
    In case of GPU
    • What is Nvidia Driver ?
    • What is Cuda version ?
    • What is GPU ?
1 Like
  • Windows 10
  • local machine with 1080ti and cuda 9.0
  • train on WGAN with my data and changings of model
  • fastai 0.7 old one
  • run on GPU but the other person searched on google told run on CPU slower also

What nvidia Driver are you using ? Your question is too generic !

Its difficult to help you out holding information with you. I don’t see any problem with WGAN at all that could slow down your training process.

May be is a bad implementation of the network , or your data is bad distributed or even your learning rate is not good enough.Also my question were important and you may think not but in fact the Nvidia driver 390 had memory leak slowing down applications overtime.

Without any clue from your side is very difficult to debate anything to you because I don’t know nothing about your model and nothing about your implementation and data…

Man I am cool … my question is … do you really want some help ?

Thanks for your patiently answers.

Sorry about above, I am not familiar about the GPU stuff, I checked the Nvidia Driver is 391.35.

Hi,

Assuming that your driver is bigger than version 390 I am sure its not related your driver.
You probably using conda right ? Well you can try to upgrade cuda version to 9.2 and cudnn because cudnn is the responsible to accelerate the cuda itself. And update your Pytorch to version 1.0 or 0.41

If that don’t work then the lastest thing to blame will be the code itself, but I just wanted to reduce the probability of problems related to drivers first.

1 Like

Thanks a lot.