queue.Full Error When Running Lesson1

Did run across this problem on AWS p2.xlarge (Oregen) with resnet50 with default bs=28, but did not try to reduce it to 14 and don’t know if that will solve the problem …

@neovaldivia Your data got corrupted, so failed to pass test: x and y are not the same length (line 167)

removing the tmp folder got that issue fixed. thanks.

Although with this new code push I’m getting an error when importing fastai libraries.
ImportError: dlopen: cannot load any more object with static TLS

Apparently, it’s the order in which the lib’s are imported…but not sure

Once ssh in, try this to see if it is better: @neovaldivia
$ cd fastai
$ git pull
$ conda env update
$ source activate fastai
$ jupyter notebook

I also first assumed it as the batch size, but later discerned since this parameter is passed to tfms_from_model so it should be in some way related to the transformations. Thanks for clarifying out. More documentation with FastAI library can definitely help with such concerns. I will check out github this weekend.

I am getting similar errors with the Queue.Full error, going to try out with smaller batch sizes and num_workers option cos maybe I am limited by the memory. Here’s the post

Going to try these parameters too, to cope with the memory

@jeremy
Guys happy to report the latest changes is stable on my local and noticed the memory now is being manage better without having to use the swap space
A snapshot while running the lesson1 on the cpu and memory resources half way through
23 AM

I don’t need to use other hyper parameters except sz for the latest test
Thanks for fixing this @jeremy !

1 Like

And today I learned from @naruto79 about nmon :slight_smile:

1 Like

Did another test with hyperparameter bs=128, seem to be stable and able to improve the execution time ! and of course it will use more GPU memory

no luck…but thank you.

I actually switched to AWS, I was using Paperspace but couldn’t invest more time in that bug anymore. =)

No issues in AWS with num_channels = 4

I am getting the same ImportError on PAPERSPACE console

I did git pull and conda env -f environment.yml still getting error.
See error below when I am doing fastai import

Error on below command in the cell.
from fastai.transforms import
Error:
ImportError Traceback (most recent call last)
in ()

~/fastai/courses/dl1/fastai/torch_imports.py in ()
1 import os
----> 2 import torch, torchvision, torchtext
3 from torch import nn, cuda, backends, FloatTensor, LongTensor, optim
4 import torch.nn.functional as F
5 from torch.autograd import Variable

~/anaconda3/lib/python3.6/site-packages/torch/init.py in ()
51 sys.setdlopenflags(_dl_flags.RTLD_GLOBAL | _dl_flags.RTLD_NOW)
52
—> 53 from torch._C import *
54
55 all += [name for name in dir(_C)

ImportError: dlopen: cannot load any more object with static TLS

I did below also as per @wgpubs
git pull
conda env update -f environment.yml
restart your terminal
still not getting rid of it.

1 Like

Did you source activate fastai? Does it work OK for you on AWS using fastai AMI?

Anything I did even source activate fastai, this bug wasn’t going away so I abandoned Paperspace instead of wasting even more time on it.

Yes i did source activate fastai still this error does not go. I have not tried on AWS using fastai AMI.

I am also running into permission error while running below.
os.makedirs(’/cache/tmp’, exist_ok=True)
!ln -fs /cache/tmp {PATH}

{Errno 13] Permission denied: ‘/cache’
Lot of issues guess I should also move to AWS or crestle.

This is working for me in Crestle , looks I am good now :slight_smile:

The notebook mentions that this step is only for crestle.

To people with
ImportError: dlopen: cannot load any more object with static TLS

I fixed it importing torch the very first thing in the notebook (like import torch in the very first cell). Seems like torch 0.3 doesn’t like cv2 being imported first (through fastai imports)

2 Likes

I’ve only seen that error when using Docker, FYI.

1 Like

I wonder why! This is so frustrating, i wasted half an hour debugging this:expressionless: