Used to be a cryptocoin miner with a spare GTX 1080 sitting around and built a rig (Ubuntu 16.04, 32 gig ram, intel 5 ). Made it through lesson 2, submitted Cats and Dogs redux but I keep getting memory errors.
On startup of a notebook and my most likely problem:
It states CudNN cant find the correct tmp files on the first run. However, the error clears the second time hence I keep thinking the error clears.
However later I run into memory errors if batch is above 16 and further on if its above 4. Here is an example with 64. I have tried all of the suggested error fixing(messing with optimizer= X) without success.
Then works fine if batchsize is 16. Completes and during the run GPU is at 99% utilization.
Other times I get a memory error completely out of the blue.
I have unsuccessfully tried:
- upgrading/reinstalling CudNN (When I google error thats the most popular suggestion)
- updating/reinstalling Nvidia drivers
Currently, i am just considering moving along and keep batch size under 16. Is there anything else I should try?