RuntimeError: Set changed size during iteration


(WG) #1

I’m occassionally getting this warning/error when I fit() or run lr_find() on my learner:

Exception in thread Thread-25:
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/envs/fastai/lib/python3.6/threading.py”, line 916, in _bootstrap_inner
self.run()
File “/home/ubuntu/anaconda3/envs/fastai/lib/python3.6/site-packages/tqdm/_tqdm.py”, line 144, in run
for instance in self.tqdm_cls._instances:
File “/home/ubuntu/anaconda3/envs/fastai/lib/python3.6/_weakrefset.py”, line 60, in iter
for itemref in self.data:
RuntimeError: Set changed size during iteration


(john v) #2

I get this error too, although it doesn’t seem to actually cause the training to stop.


(WG) #3

Yah same here. Or I just stop running the cell in jupyter and re-run it.


(Jeremy Howard (Admin)) #4

I used to see that, but not recently. Try updating all your conda packages.


(john v) #5

For anyone who is searching, the command to update all your conda packages is

conda update --all

(Lucas Goulart Vazquez) #6

Still getting this error after updating, any more suggestions?


(WG) #7

Yah I still get it periodically too, even after updating everything.

I notice it pops up if I run lr_find() back-to-back with different arguments.


(Jeremy Howard (Admin)) #8

It’s just a warning - you can safely ignore it.


(Cedric Chee) #9

I encountered this problem too today.

My Jupyter environment:

  • Google Cloud Platform setup using Paperspace script
  • Python 3.6
  • PyTorch 0.3.0.post4
  • Conda packages updated to the latest
  • fastai git repo updated to the latest
  • tqdm 4.19.4

It seems to be related to tqdm package.


(Jeremy Howard (Admin)) #10

You can safely ignore that. It just means that you interrupted an earlier command while it was running.


(Chris Palmer) #11

Regarding this error - I am also getting it in notebook 8. I think this is new behaviour and maybe something can be done to remove it?

I noticed that it was during the learning rate finder processing, maybe its because it does something to “interrupt an earlier command”, because I don’t remember doing anything…

Then again, immediately after I start learn.fit, I get the same:


#12

I am also consistently hitting this issue in notebook 8, also on the line:

learn.fit(lrs/5, 1, cycle_len=1)

It always happens the very first time I run my notebook. I’m using the AWS fastai AMI on a p2.xlarge. I’ve ran git pull, conda env update, conda update --all and it hasn’t fixed the issue.

I thought there was an issue with my environment so I created a completely fresh AMI, re-downloaded all the data files, yet I’m still hitting this issue.

I can get the error to disappear after I run it a few times, however I’m worried this is not a complete solution because there can be unintended side-effects when running commands multiple times.


(Bart Fish) #13

I have been experiencing and ignoring this error, as my LRs seem to work out correctly. But I’m guessing a deep dive into the code will find that at some point the code is not waiting on a global lock before updating the underlying set.


(Kwc Coin) #14

I got the same problem with personal setup (1080 ti). The problem is that if you ignore it and run again, it may run the remaining steps easily has to do it one by one. (I tried to run all to keep a record but stopped by this error.)

Seem you have to stop the kernel. Then run one or two steps and the try to run “all steps below”.