queue.Full Error When Running Lesson1

Actually it’s happening in Paperspace as well.
Trying with num_workers=4 and/or bs = 32 hasn’t helped so far.
I keep getting Assertion errors

I did a pull yesterday, but will try again…


AssertionError Traceback (most recent call last)
in ()
1 arch=resnet34
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz),num_workers=4)
----> 3 learn = ConvLearner.pretrained(arch, data, precompute=True)
4 learn.fit(0.01, 3)

~/fastai/courses/dl1/fastai/conv_learner.py in pretrained(self, f, data, ps, xtra_fc, xtra_cut, **kwargs)
90 def pretrained(self, f, data, ps=None, xtra_fc=None, xtra_cut=0, **kwargs):
91 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg, ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut)
—> 92 return self(data, models, **kwargs)
93
94 @property

~/fastai/courses/dl1/fastai/conv_learner.py in init(self, data, models, precompute, **kwargs)
83 elif self.metrics is None:
84 self.metrics = [accuracy_multi] if self.data.is_multi else [accuracy]
—> 85 if precompute: self.save_fc1()
86 self.freeze()
87 self.precompute = precompute

~/fastai/courses/dl1/fastai/conv_learner.py in save_fc1(self)
130 self.fc_data = ImageClassifierData.from_arrays(self.data.path,
131 (act, self.data.trn_y), (val_act, self.data.val_y), self.data.bs, classes=self.data.classes,
–> 132 test = test_act if self.data.test_dl else None, num_workers=8)
133
134 def freeze(self): self.freeze_to(-self.models.n_fc)

~/fastai/courses/dl1/fastai/dataset.py in from_arrays(self, path, trn, val, bs, tfms, classes, num_workers, test)
289 @classmethod
290 def from_arrays(self, path, trn, val, bs=64, tfms=(None,None), classes=None, num_workers=4, test=None):
–> 291 datasets = self.get_ds(ArraysIndexDataset, trn, val, tfms, test=test)
292 return self(path, datasets, bs, num_workers, classes=classes)
293

~/fastai/courses/dl1/fastai/dataset.py in get_ds(self, fn, trn, val, tfms, test, **kwargs)
274 res = [
275 fn(trn[0], trn[1], tfms[0], **kwargs), # train
–> 276 fn(val[0], val[1], tfms[1], **kwargs), # val
277 fn(trn[0], trn[1], tfms[1], **kwargs), # fix
278 fn(val[0], val[1], tfms[0], **kwargs) # aug

~/fastai/courses/dl1/fastai/dataset.py in init(self, x, y, transform)
165 def init(self, x, y, transform):
166 self.x,self.y=x,y
–> 167 assert(len(x)==len(y))
168 super().init(transform)
169 def get_x(self, i):

AssertionError:

That’s an unrelated issue. Looks like you need to delete the data/dogscats/tmp folder.

1 Like

thanks, doing removing /tmp folder got me back to the original error

OSError: [Errno 12] Cannot allocate memory

current parameters in Paperspace, basic tier:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz),num_workers=4,bs=16)

You may need to restart jupyter. Also, do a git pull since I just totally changed all this code.

Did run across this problem on AWS p2.xlarge (Oregen) with resnet50 with default bs=28, but did not try to reduce it to 14 and don’t know if that will solve the problem …

@neovaldivia Your data got corrupted, so failed to pass test: x and y are not the same length (line 167)

removing the tmp folder got that issue fixed. thanks.

Although with this new code push I’m getting an error when importing fastai libraries.
ImportError: dlopen: cannot load any more object with static TLS

Apparently, it’s the order in which the lib’s are imported…but not sure

Once ssh in, try this to see if it is better: @neovaldivia
$ cd fastai
$ git pull
$ conda env update
$ source activate fastai
$ jupyter notebook

I also first assumed it as the batch size, but later discerned since this parameter is passed to tfms_from_model so it should be in some way related to the transformations. Thanks for clarifying out. More documentation with FastAI library can definitely help with such concerns. I will check out github this weekend.

I am getting similar errors with the Queue.Full error, going to try out with smaller batch sizes and num_workers option cos maybe I am limited by the memory. Here’s the post

Going to try these parameters too, to cope with the memory

@jeremy
Guys happy to report the latest changes is stable on my local and noticed the memory now is being manage better without having to use the swap space
A snapshot while running the lesson1 on the cpu and memory resources half way through
23 AM

I don’t need to use other hyper parameters except sz for the latest test
Thanks for fixing this @jeremy !

1 Like

And today I learned from @naruto79 about nmon :slight_smile:

1 Like

Did another test with hyperparameter bs=128, seem to be stable and able to improve the execution time ! and of course it will use more GPU memory

no luck…but thank you.

I actually switched to AWS, I was using Paperspace but couldn’t invest more time in that bug anymore. =)

No issues in AWS with num_channels = 4

I am getting the same ImportError on PAPERSPACE console

I did git pull and conda env -f environment.yml still getting error.
See error below when I am doing fastai import

Error on below command in the cell.
from fastai.transforms import
Error:
ImportError Traceback (most recent call last)
in ()

~/fastai/courses/dl1/fastai/torch_imports.py in ()
1 import os
----> 2 import torch, torchvision, torchtext
3 from torch import nn, cuda, backends, FloatTensor, LongTensor, optim
4 import torch.nn.functional as F
5 from torch.autograd import Variable

~/anaconda3/lib/python3.6/site-packages/torch/init.py in ()
51 sys.setdlopenflags(_dl_flags.RTLD_GLOBAL | _dl_flags.RTLD_NOW)
52
—> 53 from torch._C import *
54
55 all += [name for name in dir(_C)

ImportError: dlopen: cannot load any more object with static TLS

I did below also as per @wgpubs
git pull
conda env update -f environment.yml
restart your terminal
still not getting rid of it.

1 Like

Did you source activate fastai? Does it work OK for you on AWS using fastai AMI?

Anything I did even source activate fastai, this bug wasn’t going away so I abandoned Paperspace instead of wasting even more time on it.

Yes i did source activate fastai still this error does not go. I have not tried on AWS using fastai AMI.

I am also running into permission error while running below.
os.makedirs(’/cache/tmp’, exist_ok=True)
!ln -fs /cache/tmp {PATH}

{Errno 13] Permission denied: ‘/cache’
Lot of issues guess I should also move to AWS or crestle.

This is working for me in Crestle , looks I am good now :slight_smile: