@dillon it seems it wasn’t really frozen forever - it was just taking a really long time to create the pytorch model for some reason. I’ve seen other mentions of that in this forum relating to paperspace. Eventually it finishes, and works fine. But restarting the notebook causes the extreme delay on model creation again.
@dillon, any news about this problem? I would like to get used to Paperspace during the course, really like the environment… its a pitty that course weeks are passing without using it.
Hi @A_TF57, read your post as trying to do conda env update on Crestle. After git pull But weirdly,Anaconda doesn’t seem to be installed. Did you install it yourself?
And more in general, have you been succesfully using Crestle for lesson notebooks until now?
I did but seems I didn’t have to. If you read the FAQs, you can install extra packages using pip3 if you want to. Everything else is already setup for you to use.
Hi guys, I’m not able to reproduce the issue 100% of the time but it does seem to be some weird interaction with import order. I have also not been able to reliably repro the issue of the freezing model creation.
I’m having the team here build out a new template to see if we can get a fresh base to work from.
OK, so I spent the last few days building out different versions of the template with different drivers/ base OS/etc.
It seems that the dlopen: cannot load any more object with static TLS error only occurs on Ubuntu 14.04 and when I switched to a Ubuntu 16.04 template that issue went away entirely.
That said, the new 16.04 template is now freezing on the learn.fit() consistently just as you mentioned. What I notice is that the CPU is getting hit really hard so I’m going to have to dig in to what this method is doing. From the outside it almost looks to me like there is a memory leak in the method. Virtual memory use explodes as CPU starts to really heat up.
@jeremy just as you mentioned it eventually works but something doesn’t look right and it can take 5 minutes to get going. This is on a brand new Ubuntu 16.04 template where I installed new drivers, fresh Conda install, etc.
Sure. I can give it a try. Most likely later in the evening today. I am not using Paperspace at the moment, but used it in the past and will most likely return to it after a few months. So ping me if I can help.
@dillon I was seeing freezes on constructing learn, not on fitting. It might be worth doing a git pull and trying again, because I switched from ProcessPool to ThreadPool last night so if it was a multiprocessing issue this might well fix it.
We’re not seeing the problem on any other platform BTW, so it’s something specific to your config. Are you using docker? There are various threads around about docker and pytorch IIRC…