BrokenPipeError using Jupyter Notebook, Lesson 1

comparativelyDog · March 16, 2019, 3:55pm

In lesson1-pets, using the jupyter notebook
On this line: data.show_batch(rows=3, figsize=(7,6))

I get this error message:

BrokenPipeError Traceback (most recent call last)
in
----> 1 data.show_batch(rows=3, figsize=(7,6))

~\Anaconda3\lib\site-packages\fastai\basic_data.py in show_batch(self, rows, ds_type, reverse, **kwargs)
183 def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, reverse:bool=False, **kwargs)->None:
184 “Show a batch of data in ds_type on a few rows.”
→ 185 x,y = self.one_batch(ds_type, True, True)
186 if reverse: x,y = x.flip(0),y.flip(0)
187 n_items = rows **2 if self.train_ds.x._square_show else rows

~\Anaconda3\lib\site-packages\fastai\basic_data.py in one_batch(self, ds_type, detach, denorm, cpu)
166 w = self.num_workers
167 self.num_workers = 0
→ 168 try: x,y = next(iter(dl))
169 finally: self.num_workers = w
170 if detach: x,y = to_detach(x,cpu=cpu),to_detach(y,cpu=cpu)

~\Anaconda3\lib\site-packages\fastai\basic_data.py in iter(self)
73 def iter(self):
74 “Process and returns items from DataLoader.”
—> 75 for b in self.dl: yield self.proc_batch(b)
76
77 @classmethod

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
817
818 def iter(self):
→ 819 return _DataLoaderIter(self)
820
821 def len(self):

~\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
558 # before it starts, and del tries to join but will get:
559 # AssertionError: can only join a started process.
→ 560 w.start()
561 self.index_queues.append(index_queue)
562 self.workers.append(w)

~\Anaconda3\lib\multiprocessing\process.py in start(self)
110 ‘daemonic processes are not allowed to have children’
111 _cleanup()
→ 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
→ 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
→ 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
—> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’‘’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

I have installed fastai according to the instructions.

conda install -c pytorch -c fastai fastai
conda install jupyter notebook
conda install -c conda-forge jupyter_contrib_nbextensions

I am on Windows 10, GPU is geforce gtx 1060.

treyqi · March 16, 2019, 9:59pm

Same here

kireeti · March 17, 2019, 8:46am

I face this issue as well. Did any resolve this ?

SBecker · March 17, 2019, 10:58am

As far as I know it’s an issue regarding multiprocessing. The easiest way to fix this is to set num_workers=0 when you set up your dataset/-loader.

See -> Custom ItemList, getting ForkingPickler broken pipe

But I don’t know why it became a problem all of the sudden, on older fastai versions it worked quite fine for me (PyTorch or fastai issue?!)

kireeti · March 18, 2019, 12:14am

@SBecker I could fix it with your suggestion. What would be the impact of this. I am completely new to Pytorch and dont have an idea of what it means. Thanks for the reply.

SBecker · March 18, 2019, 12:57pm

Certain operations will take significantly longer but unfortunately because of this issue I can’t give you a direct comparison and I’m not too familiar with multiprocessing/threading and this kind of stuff to give you valid insight on how much this will impact your training speed.

I’m currently doing the Kaggle histopathologic cancer detection challenge on my own machine (GTX 1080 Ti) with num_workers=0 and the training time is absolutely fine. TTA takes quite long tho, I guess that’s because of no multiprocessing?!

apereiral · March 19, 2019, 10:00pm

I tracked the error to the update from 1.0.48 (works) to 1.0.49 (doesn’t work). Nonetheless, cnn_learner gets stuck on Win10 on version 1.0.48, unless you set torch.set_num_threads(1) (which also impacts on performance). I don’t know which one is worse (num_workers=0 or torch.set_num_threads(1)) though.

treyqi · March 22, 2019, 2:47am

It seems that I am newer than you. Can you let me know where to put num_workers=0.

treyqi · March 22, 2019, 2:56am

Thank your help. Can you give more information about “when you set up your dataset/-loader”? can you give a little bit more information where I add num_workers =0?

treyqi · March 22, 2019, 2:58am

Sorry, I now know what you mean.

naveen_v · March 22, 2019, 4:42pm

hi every one this is my problem… please can you explain how to fix this i am actually trying it from one day…

and every one is saying to keep the num_workers=0 and you can see in the highlighted code where it is set to 0…
so please help me … guys…
this is the complete error code…

BrokenPipeError Traceback (most recent call last)
in
----> 1 learn.fit_one_cycle(4)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
20 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
21 final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
—> 22 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
23
24 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, wd:float=None):

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_train.py in fit(self, epochs, lr, wd, callbacks)
194 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
195 if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
–> 196 fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
197
198 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_train.py in fit(epochs, learn, callbacks, metrics)
96 cb_handler.set_dl(learn.data.train_dl)
97 cb_handler.on_epoch_begin()
—> 98 for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
99 xb, yb = cb_handler.on_batch_begin(xb, yb)
100 loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastprogress\fastprogress.py in iter(self)
64 self.update(0)
65 try:
—> 66 for i,o in enumerate(self._gen):
67 yield o
68 if self.auto_update: self.update(i+1)

~\Anaconda3\envs\fastai_v1\lib\site-packages\fastai\basic_data.py in iter(self)
73 def iter(self):
74 “Process and returns items from DataLoader.”
—> 75 for b in self.dl: yield self.proc_batch(b)
76
77 @classmethod

~\Anaconda3\envs\fastai_v1\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
817
818 def iter(self):
–> 819 return _DataLoaderIter(self)
820
821 def len(self):

~\Anaconda3\envs\fastai_v1\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
558 # before it starts, and del tries to join but will get:
559 # AssertionError: can only join a started process.
–> 560 w.start()
561 self.index_queues.append(index_queue)
562 self.workers.append(w)

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\process.py in start(self)
103 ‘daemonic processes are not allowed to have children’
104 _cleanup()
–> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
–> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
–> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
—> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\envs\fastai_v1\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 ‘’‘Replacement for pickle.dump() using ForkingPickler.’’’
—> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

peterwalkley · March 23, 2019, 11:16am

For others not sure where to add the num_workers part, its here:

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs, num_workers=0
                                  ).normalize(imagenet_stats)

naveen_v · March 23, 2019, 12:28pm

very thanks it worked but i would like to know more details about the problem… why it is caused … really thanks @peterwalkley and it would be of greater help… if you could elaborate…

peterwalkley · March 23, 2019, 12:59pm

My understanding is that the underlying problem is in the pytorch layer on Windows and also that it is not a trivial thing to fix. See: https://github.com/pytorch/pytorch/issues/12831

Setting num_workers=0 has fixed it every time for me, so I’ve not felt any burning desire to cut over to linux.

naveen_v · March 23, 2019, 3:15pm

thanks for your help…
and with your link i got to some new things…
1–>num_workers=0
num_workers– how many sub processes to use for data loading.
2–> value 0 means that the data will be loaded in the main process. (default: 0) to process …
and thanks…

DeShet · March 25, 2019, 9:03am

I had the same issue but a fixed was posted in the following thread:

You will have to make 1 or 2 changes to your ipython.py file that is used by fastai

Afterwards you will have to wrap you jupyter notebook code from lesson 1 into:

if __name__ == '__main__':

After doing these changes I didn’t need num_workers=0 anymore and the broken pipe error was gone

Hope it helps

shahid · January 5, 2020, 4:47am

Just add num_workers=0 when using the databunch function and this will solve the issue

rgh · November 7, 2022, 9:13am

thanks a lot! your solution solves my problem.