I was also having the same problem with pickling in Win10. In my case when trying to create a DataBlock that contains a TextBlock. I tracked down the issue and found a solution, but involves changes in torch_core. I will share it here for anyone that might be interested.
The root of the issue is that the function torch_core.parallel_gen contains two functions within it that cannot be pickled (at least it in windows, but AFAIK, nested functions in general are not pickleable). Pickling in this case is needed to parallelize processes, so this is why this problem might appear in different contexts. The workaround I found is to take the nested functions out, and use functools.partial to evaluate the needed info inside parallel_gen. So, if we change the function parallel_gen from this:
def parallel_gen(cls, items, n_workers=defaults.cpus, as_gen=False, **kwargs):
"Instantiate `cls` in `n_workers` procs & call each on a subset of `items` in parallel."
batches = np.array_split(items, n_workers)
idx = np.cumsum(0 + L(batches).map(len))
queue = Queue()
def f(batch, start_idx):
for i,b in enumerate(cls(**kwargs)(batch)): queue.put((start_idx+i,b))
def done(): return (queue.get() for _ in progress_bar(items, leave=False))
yield from run_procs(f, done, L(batches,idx).zip())
to this:
import functools
def f_pg(clse,queue,batch, start_idx):
for i,b in enumerate(clse(batch)): queue.put((start_idx+i,b))
def done_pg(queue,items): return (queue.get() for _ in progress_bar(items, leave=False))
def parallel_gen(cls, items, n_workers=defaults.cpus, as_gen=False, **kwargs):
"Instantiate `cls` in `n_workers` procs & call each on a subset of `items` in parallel."
batches = np.array_split(items, n_workers)
idx = np.cumsum(0 + L(batches).map(len))
queue = Queue()
f=functools.partial(f_pg,cls(**kwargs),queue)
done=functools.partial(done_pg,queue,items)
yield from run_procs(f, done, L(batches,idx).zip())
Then the problem is solved. It feels a bit slow tho, so I am not sure if this is a good solution or a simple hack to make this work. I tested with this simple example that tries to load data into a Datablock:
import numpy as np
import pandas as pd
from fastai2.text.all import Callback,DataBlock,TextBlock,CategoryBlock,ColReader
t=[list('abcdefghijklmnopqrstuvwxyz'),list('0123456789')]
t_l=np.random.randint(0,2,100)
t_d=[''.join(np.random.choice(t[i],np.random.randint(1,10))) for i in t_l]
df=pd.DataFrame.from_dict({'text':t_d,'label':t_l})
db = DataBlock(blocks=(TextBlock.from_df('text'), CategoryBlock),
get_x=ColReader('text'),
get_y=ColReader('label'))
dls = db.dataloaders(df)
I am new to fastai, so I am not sure if this is the right way of doing things, but I thought this might be helpful, so I decided to share.