I set up a separate partition on my desktop PC to create a DL environment. I have 16gb RAM and a GTX 1060 6gb. I created a 100gb partition and installed Ubuntu 17.10. I used the paperspace bash script to setup my environment and it all worked seamlessly.
I opened up the lesson 1 notebook and it all seemed okay, CUDA and CuDNN were both enabled and I was able to load the images. However, as soon as it hits the line learn = ConvLearner.pretrained(arch, data, precompute=True), the kernel restarts itself. Jupyter gives me a pop-up saying The kernel appears to have died. It will restart automatically. There’s no additional stack-trace to go by either.
I’ve done a ‘conda env update’, I’ve tried reducing the batch size, restarting my PC, tried running Jupyter with the debug option but it gave no extra insight. Googling turned up no obvious solutions. I’ve double-checked everything was installed and configured correctly.
I’m out of ideas! Does anyone have any clue what might be causing this crashing?
I have the same problem on the same line of code. I’m running CentOS 7.4.1708, Python 3.6.4 and pytorch 0.3.1. I also have a GTX1060 6GB. CentOS and conda are up-to-date on all packages. I tried installing the latest pytorch from pytorch.org but it didn’t make a difference.
Which leads me to believe the problem is in pytorch. Possibly an incompatibility with my version of python. I’m going to try to download pytorch source and compile it myself.
It seems Pytorch required SSE4 which my older CPU doesn’t support. Recompiling should fix it but I ran into some known issues with compiling Pytorch on Ubuntu 17.10 so I ended up downgrading to Ubuntu 16.04. Finally got it working