Kernel keeps crashing on call to 'ConvLearner.pretrained'

funkdafied · February 22, 2018, 11:22am

I set up a separate partition on my desktop PC to create a DL environment. I have 16gb RAM and a GTX 1060 6gb. I created a 100gb partition and installed Ubuntu 17.10. I used the paperspace bash script to setup my environment and it all worked seamlessly.

I opened up the lesson 1 notebook and it all seemed okay, CUDA and CuDNN were both enabled and I was able to load the images. However, as soon as it hits the line learn = ConvLearner.pretrained(arch, data, precompute=True), the kernel restarts itself. Jupyter gives me a pop-up saying The kernel appears to have died. It will restart automatically. There’s no additional stack-trace to go by either.

I’ve done a ‘conda env update’, I’ve tried reducing the batch size, restarting my PC, tried running Jupyter with the debug option but it gave no extra insight. Googling turned up no obvious solutions. I’ve double-checked everything was installed and configured correctly.

I’m out of ideas! Does anyone have any clue what might be causing this crashing?

Any help is much appreciated!

sjjohns · February 22, 2018, 5:30pm

I have the same problem on the same line of code. I’m running CentOS 7.4.1708, Python 3.6.4 and pytorch 0.3.1. I also have a GTX1060 6GB. CentOS and conda are up-to-date on all packages. I tried installing the latest pytorch from pytorch.org but it didn’t make a difference.

In /var/log/messages, I’m seeing errors like:

kernel: traps: python[1254] trap invalid opcode ip:7f6c8a0e1df2 sp:7ffedfc4fa48 error:0 in libTH.so.1[7f6c89d49000+28ba000]

Which leads me to believe the problem is in pytorch. Possibly an incompatibility with my version of python. I’m going to try to download pytorch source and compile it myself.

sjjohns · February 22, 2018, 7:05pm

Recompiling from source solved the problem.

funkdafied · February 22, 2018, 8:55pm

Brilliant, I’ll give it a try! Thanks

funkdafied · February 24, 2018, 4:52am

It seems Pytorch required SSE4 which my older CPU doesn’t support. Recompiling should fix it but I ran into some known issues with compiling Pytorch on Ubuntu 17.10 so I ended up downgrading to Ubuntu 16.04. Finally got it working

sjjohns · February 26, 2018, 5:06pm

Ah ha. That was my problem too. I have an older Intel CPU that supports SSE3 but not SS4.

Glad to hear you’re working now!