Problem: dataloader is not using all CPUs. Because of that training is very slow.
MacOS Python 3.6 Fastai==2.1.5 Torch==1.7.0
I am creating a dataloader like this:
textblock = TextBlock.from_df( '_VALUE', # Which dataframe column to read is_lm=True, # We only have X and no Y for the language model tok=RulesTokenizer(), rules= # Diable default fastai rules ) datablock = DataBlock( blocks=textblock, # That's how we read, tokenize and get X get_x=ColReader('text'), # After going through TextBlock, tokens are in the column `text` splitter=RandomSplitter(0.2) # Splitting to train/validation ) dataloader = datablock.dataloaders( subset, # Source of data bs=256, # Batch size num_workers=8, pin_memory=True, )
When I attempt to train, I get a bunch of duplicate errors:
[W ParallelNative.cpp:206] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
From this article, it appears that Pytorch has an error in 1.7 related to parallel processing.
Setting an environment variable to 1 removes the error:
(Potentially) as a result of it CPU load is very low (no parallel computing?) and training is slow too. I am training on CPU-only Mac, so I expect all CPUs to run at 100% to reach good speeds. But seems like the dataloader is the bottleneck due to lack of parallel computing.
I wonder if anyone had the same problem?