Same issue for me, when initializing TextLMDataBunch.from_df()
Most likely, you need to add num_wokers=0 (Required for Windows env)
dls = ImageDataLoaders.from_df(df, PATH_DATA,
fn_col="FILENAME",
label_col="LABEL",
num_workers = 0,
bs=4)
Hi, I am facing a issue and I have a basic (and most probably stupid) query. I am a beginner to FastAI and trying to run it on Windows 10. I have installed it following the instructions on https://docs.fast.ai/ with Anaconda. When trying to train the model with GPU, I am getting out of memory error (probably because GPU memory is only 2 GB). Is there a workaround with this?
Otherwise, is it possible to use FastAI without GPU?
Welcome @shobhit - reducing the batch size (bs) will reduce the amount of GPU memory used, so I’d try that first. Using the CPU will be pretty slow - and I can’t recall how to force this but maybe change the default device?
Thank you @brismith , reducing the batch size worked.
I am running the example script and am getting an error.
RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:258
I’ve tried lots of googling but have not been able to find a solution. Any help is appreciated.
Example Script: fastai/dataloader_spawn.py at master · fastai/fastai · GitHub
EDIT: Using Conda and a GTX 1050. I’ve tried installing the cuda drivers liked here, and also not using these drivers as there was no mention of them on the github. Are they necessary?
EDIT 2: Num workers = 0 fixes this, however I can’t figure out how to get multiple workers even though I’m not using Jupyter notebooks.
(ml) C:\Users\Ben\Desktop\Python Scripts\BiomeRecigonition>python Example.py
C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\torch\_tensor.py:1023: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at ..\aten\src\ATen\native\BatchLinearAlgebra.cpp:760.)
ret = func(*args, **kwargs)
THCudaCheck FAIL file=..\torch/csrc/generic/StorageSharing.cpp line=258 error=801 : operation not supported
Traceback (most recent call last):
File "C:\Users\Ben\Desktop\Python Scripts\BiomeRecigonition\Example.py", line 37, in <module>
learn.lr_find()
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\callback\schedule.py", line 282, in lr_find
with self.no_logging(): self.fit(n_epoch, cbs=cb)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 221, in fit
self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 163, in _with_events
try: self(f'before_{event_type}'); f()
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 212, in _do_fit
self._with_events(self._do_epoch, 'epoch', CancelEpochException)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 163, in _with_events
try: self(f'before_{event_type}'); f()
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 206, in _do_epoch
self._do_epoch_train()
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 198, in _do_epoch_train
self._with_events(self.all_batches, 'train', CancelTrainException)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 163, in _with_events
try: self(f'before_{event_type}'); f()
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\learner.py", line 169, in all_batches
for o in enumerate(self.dl): self.one_batch(*o)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\fastai\data\load.py", line 109, in __iter__
for b in _loaders[self.fake_l.num_workers==0](self.fake_l):
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
w.start()
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\Ben\anaconda3\envs\ml\lib\site-packages\torch\multiprocessing\reductions.py", line 247, in reduce_tensor
event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:258
(ml) C:\Users\Ben\Desktop\Python Scripts\BiomeRecigonition>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\Ben\anaconda3\envs\ml\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
without the nvidia cuda drivers then likely when you installed pytorch, it installed the cpu version and not the gpu version. Start Locally | PyTorch
Hi all,
I’m testing my workflow on a Windows machine after previously solely working on Ubuntu and am getting a weird pause when using functions like learn.get_preds()
or learn.fit_one_cycle()
or learn.lr_find()
.
The pause is consistently 25 seconds every time a function like that is called. I did not have this issue on Ubuntu. The results are fine, but the pause is driving me crazy! Any idea what could be causing this?
Thanks!