Runtime Error (801) in Intro Notebook When Using Jupyter Notebook

og1 · September 6, 2020, 1:50pm

Hi all:

I’m trying to get the latest version of Fastai installed on my Windows 10 dev laptop (using miniconda) with a Nvidia 940M GPU and i7-6500U CPU and 12G RAM for the Jupyter Notebook latest version of the course. I keep running into runtime errors when I try to execute the code examples in the course from the GitHub course file I have downloaded locally. I’ve verified the 940M GPU is working properly using the CUDA devicequery. I’m using CUDA 11.

Here’s the error message from the first example from the Jupyter Notebook.
Note: I have a much more powerful Windows 10 workstation that I use for production activities. But I don’t want to install fastai on that system until I’m sure I can get fastai working on Windows 10 in a stable manner. So I’m wanted to get fastai stable and the install procedure for Windows 10 confirmed before attempting to install on the production Windows 10 workstation. I’ll look at the Linux CentOS 7 and CentOS 8 virtualized machine installs later on, I’m just specifically interested in the Windows 10 workstations and how fastai can work on those machines right now given they are our best machines.

epoch	train_loss	valid_loss	error_rate	time

0 nan 00:00

RuntimeError Traceback (most recent call last)
in
11
12 learn = cnn_learner(dls, resnet34, metrics=error_rate)
—> 13 learn.fine_tune(1)

~\miniconda3\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
470 init_args.update(log)
471 setattr(inst, ‘init_args’, init_args)
→ 472 return inst if to_return else f(*args, **kwargs)
473 return _f
474

~\miniconda3\lib\site-packages\fastai\callback\schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
159 “Fine tune with freeze for freeze_epochs then with unfreeze from epochs using discriminative LR”
160 self.freeze()
→ 161 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
162 base_lr /= 2
163 self.unfreeze()

~\miniconda3\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
470 init_args.update(log)
471 setattr(inst, ‘init_args’, init_args)
→ 472 return inst if to_return else f(*args, **kwargs)
473 return _f
474

~\miniconda3\lib\site-packages\fastai\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
111 scheds = {‘lr’: combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
112 ‘mom’: combined_cos(pct_start, *(self.moms if moms is None else moms))}
→ 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
114
115 # Cell

~\miniconda3\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
470 init_args.update(log)
471 setattr(inst, ‘init_args’, init_args)
→ 472 return inst if to_return else f(*args, **kwargs)
473 return _f
474

~\miniconda3\lib\site-packages\fastai\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
205 self.opt.set_hypers(lr=self.lr if lr is None else lr)
206 self.n_epoch,self.loss = n_epoch,tensor(0.)
→ 207 self._with_events(self._do_fit, ‘fit’, CancelFitException, self._end_cleanup)
208
209 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

~\miniconda3\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
153
154 def with_events(self, f, event_type, ex, final=noop):
→ 155 try: self(f’before{event_type}') ;f()
156 except ex: self(f’after_cancel{event_type}‘)
157 finally: self(f’after_{event_type}’) ;final()

~\miniconda3\lib\site-packages\fastai\learner.py in _do_fit(self)
195 for epoch in range(self.n_epoch):
196 self.epoch=epoch
→ 197 self._with_events(self._do_epoch, ‘epoch’, CancelEpochException)
198
199 @log_args(but=‘cbs’)

~\miniconda3\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
153
154 def with_events(self, f, event_type, ex, final=noop):
→ 155 try: self(f’before{event_type}') ;f()
156 except ex: self(f’after_cancel{event_type}‘)
157 finally: self(f’after_{event_type}’) ;final()

~\miniconda3\lib\site-packages\fastai\learner.py in _do_epoch(self)
189
190 def _do_epoch(self):
→ 191 self._do_epoch_train()
192 self._do_epoch_validate()
193

~\miniconda3\lib\site-packages\fastai\learner.py in _do_epoch_train(self)
181 def _do_epoch_train(self):
182 self.dl = self.dls.train
→ 183 self._with_events(self.all_batches, ‘train’, CancelTrainException)
184
185 def _do_epoch_validate(self, ds_idx=1, dl=None):

~\miniconda3\lib\site-packages\fastai\learner.py in with_events(self, f, event_type, ex, final)
153
154 def with_events(self, f, event_type, ex, final=noop):
→ 155 try: self(f’before{event_type}') ;f()
156 except ex: self(f’after_cancel{event_type}‘)
157 finally: self(f’after_{event_type}’) ;final()

~\miniconda3\lib\site-packages\fastai\learner.py in all_batches(self)
159 def all_batches(self):
160 self.n_iter = len(self.dl)
→ 161 for o in enumerate(self.dl): self.one_batch(*o)
162
163 def _do_one_batch(self):

~\miniconda3\lib\site-packages\fastai\data\load.py in iter(self)
101 self.randomize()
102 self.before_iter()
→ 103 for b in _loadersself.fake_l.num_workers==0:
104 if self.device is not None: b = to_device(b, self.device)
105 yield self.after_batch(b)

~\miniconda3\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
735 # before it starts, and del tries to join but will get:
736 # AssertionError: can only join a started process.
→ 737 w.start()
738 self._index_queues.append(index_queue)
739 self._workers.append(w)

~\miniconda3\lib\multiprocessing\process.py in start(self)
119 ‘daemonic processes are not allowed to have children’
120 _cleanup()
→ 121 self._popen = self._Popen(self)
122 self._sentinel = self._popen.sentinel
123 # Avoid a refcycle if the target function holds an indirect

~\miniconda3\lib\multiprocessing\context.py in _Popen(process_obj)
222 @staticmethod
223 def _Popen(process_obj):
→ 224 return _default_context.get_context().Process._Popen(process_obj)
225
226 class DefaultContext(BaseContext):

~\miniconda3\lib\multiprocessing\context.py in _Popen(process_obj)
325 def _Popen(process_obj):
326 from .popen_spawn_win32 import Popen
→ 327 return Popen(process_obj)
328
329 class SpawnContext(BaseContext):

~\miniconda3\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
91 try:
92 reduction.dump(prep_data, to_child)
—> 93 reduction.dump(process_obj, to_child)
94 finally:
95 set_spawning_popen(None)

~\miniconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’‘’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

~\miniconda3\lib\site-packages\torch\multiprocessing\reductions.py in reduce_tensor(tensor)
238 ref_counter_offset,
239 event_handle,
→ 240 event_sync_required) = storage.share_cuda()
241 tensor_offset = tensor.storage_offset()
242 shared_cache[handle] = StorageWeakRef(storage)

RuntimeError: cuda runtime error (801) : operation not supported at …\torch/csrc/generic/StorageSharing.cpp:247

These runtime errors are most likely related to how fastai is installed on this Windows 10 system. I have read several different versions of how to install fastai on Windows 10 using miniconda and they vary a great deal.

All the other runtime fix suggestions about this CUDA runtime 801 runtime error associated with the “dataloader” function having to be set to “0”, I don’t get what people are saying.

This is not the only runtime error. But I figure I’d start with this first bit of code from the course, I can’t get to execute.

Thanks for your help and time.

ObSkewer · September 6, 2020, 10:31pm

Just got this myself. Found the answer on https://github.com/fastai/fastbook/issues/85

Basically, set this variable here:

Fixes it for me.

ObSkewer · September 6, 2020, 10:42pm

Although, I do now get the same error in the first cell of Results…

Too late at night for me to figure this one out just now, so it’s a problem for after work tomorrow

og1 · September 6, 2020, 11:42pm

Thanks for all the info and help.

I implemented the num_workers=0 syntax for the ImageDataLoaders as I’ve shown in the below screen capture.

That syntax fixed the original runtime error. Greatly appreciated.

Though, unfortunately, I encountered another, but separate runtime error.

It looks like the 940M discrete CPU on my laptop with 2G of memory is not going to be enough to run Jupyter Notebook in this context/environment for the course on my Windows 10 laptop.

See below runtime error about not enough GPU memory.

I will need to either break out the external USB based GPUs (have some old external GPUs) to use with the laptop or set things up to run Jupyter Notebook on one of the Windows 10 high end workstations with the old 1070s or one of the old Pascal Quadro GPUs as a dev environment. I will setup directly on Windows 10 high end workstation host. I will also set it up an environment within a Linux virtual machine running on the Windows 10 host.

I’ll can always access either environment via the laptop over remote desktop or the VPN. I will let you all know how those setups go.

Please no one spend any more time on this. I will duplicate my Windows 10 laptop setup and dev environment on the more powerful workstations (as close as I can to the laptop), and let you all know how things work.

A bit disappointing the Nvidia 940M discrete GPU on the laptop can’t handle things.

I need to do some experiments to see if the newer discrete Nvidia laptop GPUs can handle the processing and memory requirements for the FastAI 2020 course and other more demanding FastAI training requirements under Windows 10.

og1 · September 7, 2020, 1:14am

Hi all:

I got the Nvidia 940M with the 2G memory to work on the Windows 10 laptop.

In addition to the required num_workers=0 parameter on the ImageDataLoader for my Windows 10 environment, I also needed set the batch size much lower to bs=4 (see below syntax) because of the memory situation on the 940M. Apparently the default setting for the batch size for the ImageDataLoader is much higher.

The training took almost 25 mins vs the original example that looks like it finished orders of magnitude faster.

It seems GPU memory size and management is a big requirement with FastAI/Pytorch.

We skipped the Nvidia 2000 series GPUs (at least we never bought them new), as it was obviously a big Nvidia money grab vs what the 1000 series GPUs could do. And the outsized pricing on those new 2000 series GPUs in turn kept the used market and new retail pricing on the previous 1000s series technology GPUs higher than it should have been for a very long time. These Ampere 3000 series GPUs look interesting. We’ll see how the lab validation on those new Nvidia GPUs goes.

The notebook is behaving much better now, but I notice GPU memory still seems to be an issue on other cells in the notebook. I get an index error on running another cell in the intro notebook. See below.

So, I’m not going to push it with the 940M discrete GPU on the laptop anymore with FastAI and Jupyter Notebook, given what I’ve seen so far with FastAI on this GPU.

I’m going to close this topic out after I let you all know how my FastAI setup runs on one of the high end Windows 10 workstations with the old 1000 series GPU.

Thanks for all the help.

og1 · September 7, 2020, 4:24pm

Just FYI!

The index error above was my fault. The image was not loaded to the server from the previous cell. That’s why the error showed up. Still getting used to this Jupyter Notebook environment. The 940M GPU is slow, but it seems to be working as it should on the laptop running on Windows 10.

Thanks again.

pomegranate · September 8, 2020, 4:40am

Hey, after implementing the num_workers=0 fix my learner.predict on the cat test still gives the same 801 runtime error, did you ever encounter this?

pomegranate · September 8, 2020, 4:50am

Knowing your versions of pytorch and fastai would probably help me solve this issue too, thread of people with same issue here:
Thread