BrokenPipeError using Jupyter Notebook, Lesson 1

In lesson1-pets, using the jupyter notebook
On this line: data.show_batch(rows=3, figsize=(7,6))

I get this error message:

BrokenPipeError Traceback (most recent call last)
in
----> 1 data.show_batch(rows=3, figsize=(7,6))

~\Anaconda3\lib\site-packages\fastai\basic_data.py in show_batch(self, rows, ds_type, reverse, **kwargs)
183 def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, reverse:bool=False, **kwargs)->None:
184 “Show a batch of data in ds_type on a few rows.”
→ 185 x,y = self.one_batch(ds_type, True, True)
186 if reverse: x,y = x.flip(0),y.flip(0)
187 n_items = rows **2 if self.train_ds.x._square_show else rows

~\Anaconda3\lib\site-packages\fastai\basic_data.py in one_batch(self, ds_type, detach, denorm, cpu)
166 w = self.num_workers
167 self.num_workers = 0
→ 168 try: x,y = next(iter(dl))
169 finally: self.num_workers = w
170 if detach: x,y = to_detach(x,cpu=cpu),to_detach(y,cpu=cpu)

~\Anaconda3\lib\site-packages\fastai\basic_data.py in iter(self)
73 def iter(self):
74 “Process and returns items from DataLoader.”
—> 75 for b in self.dl: yield self.proc_batch(b)
76
77 @classmethod

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
817
818 def iter(self):
→ 819 return _DataLoaderIter(self)
820
821 def len(self):

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
558 # before it starts, and del tries to join but will get:
559 # AssertionError: can only join a started process.
→ 560 w.start()
561 self.index_queues.append(index_queue)
562 self.workers.append(w)

~\Anaconda3\lib\multiprocessing\process.py in start(self)
110 ‘daemonic processes are not allowed to have children’
111 _cleanup()
→ 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
→ 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
→ 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
—> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’‘’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

I have installed fastai according to the instructions.

conda install -c pytorch -c fastai fastai
conda install jupyter notebook
conda install -c conda-forge jupyter_contrib_nbextensions

I am on Windows 10, GPU is geforce gtx 1060.

Same here

I face this issue as well. Did any resolve this ?

As far as I know it’s an issue regarding multiprocessing. The easiest way to fix this is to set num_workers=0 when you set up your dataset/-loader.

See -> Custom ItemList, getting ForkingPickler broken pipe

But I don’t know why it became a problem all of the sudden, on older fastai versions it worked quite fine for me (PyTorch or fastai issue?!)

5 Likes

@SBecker I could fix it with your suggestion. What would be the impact of this. I am completely new to Pytorch and dont have an idea of what it means. Thanks for the reply.

Certain operations will take significantly longer but unfortunately because of this issue I can’t give you a direct comparison and I’m not too familiar with multiprocessing/threading and this kind of stuff to give you valid insight on how much this will impact your training speed. :sweat_smile:

I’m currently doing the Kaggle histopathologic cancer detection challenge on my own machine (GTX 1080 Ti) with num_workers=0 and the training time is absolutely fine. TTA takes quite long tho, I guess that’s because of no multiprocessing?!

1 Like

I tracked the error to the update from 1.0.48 (works) to 1.0.49 (doesn’t work). Nonetheless, cnn_learner gets stuck on Win10 on version 1.0.48, unless you set torch.set_num_threads(1) (which also impacts on performance). I don’t know which one is worse (num_workers=0 or torch.set_num_threads(1)) though.

1 Like

It seems that I am newer than you. Can you let me know where to put num_workers=0.

1 Like

Thank your help. Can you give more information about “when you set up your dataset/-loader”? can you give a little bit more information where I add num_workers =0?

Sorry, I now know what you mean.

hi every one this is my problem… please can you explain how to fix this i am actually trying it from one day…


and every one is saying to keep the num_workers=0 and you can see in the highlighted code where it is set to 0…
so please help me … guys…
this is the complete error code…


BrokenPipeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(4)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
21 final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
—> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
23
24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
194 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
195 if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
–> 196 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
197
198 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_train.py in fit(epochs, learn, callbacks, metrics)
96 cb_handler.set_dl(learn.data.train_dl)
97 cb_handler.on_epoch_begin()
—> 98 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
99 xb, yb = cb_handler.on_batch_begin(xb, yb)
100 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastprogress\fastprogress.py in iter(self)
64 self.update(0)
65 try:
—> 66 for i,o in enumerate(self._gen):
67 yield o
68 if self.auto_update: self.update(i+1)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_data.py in iter(self)
73 def iter(self):
74 “Process and returns items from DataLoader.”
—> 75 for b in self.dl: yield self.proc_batch(b)
76
77 @classmethod

~\Anaconda3\envs\fastai_v1\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
817
818 def iter(self):
–> 819 return _DataLoaderIter(self)
820
821 def len(self):

~\Anaconda3\envs\fastai_v1\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
558 # before it starts, and del tries to join but will get:
559 # AssertionError: can only join a started process.
–> 560 w.start()
561 self.index_queues.append(index_queue)
562 self.workers.append(w)

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\process.py in start(self)
103 ‘daemonic processes are not allowed to have children’
104 _cleanup()
–> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
–> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
–> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
—> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’’’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

For others not sure where to add the num_workers part, its here:

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs, num_workers=0
                                  ).normalize(imagenet_stats)
2 Likes

very thanks it worked but i would like to know more details about the problem… why it is caused … really thanks @peterwalkley and it would be of greater help… if you could elaborate…

My understanding is that the underlying problem is in the pytorch layer on Windows and also that it is not a trivial thing to fix. See: https://github.com/pytorch/pytorch/issues/12831

Setting num_workers=0 has fixed it every time for me, so I’ve not felt any burning desire to cut over to linux.

thanks for your help…
and with your link i got to some new things…
1–>num_workers=0
num_workers– how many sub processes to use for data loading.
2–> value 0 means that the data will be loaded in the main process. (default: 0) to process …
and thanks…

I had the same issue but a fixed was posted in the following thread:

You will have to make 1 or 2 changes to your ipython.py file that is used by fastai

Afterwards you will have to wrap you jupyter notebook code from lesson 1 into:

if __name__ == '__main__':

After doing these changes I didn’t need num_workers=0 anymore and the broken pipe error was gone

Hope it helps

2 Likes

Just add num_workers=0 when using the databunch function and this will solve the issue

thanks a lot! your solution solves my problem.