I have this piece of code in my prediction and my text has only 4 words.
Blockquote #tokenize using the fastai wrapper around spacy
tic = time.clock()
tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
toc = time.clock()
This function creates multiple processes in python. There should be overhead time on that + messaging passing to different processors + tokenizer time.
I assume you mean 29 seconds or 2.9 seconds. Also len(texts) = 4?
I doubt. I used a larger corpus and it was less than that. Can you confirm the output of tok is word based? I think it is character based perhaps (not sure).
I ran into the same issue. I think whatever magic happens in the CPU when it’s partitioning to cores can take a bit of time.
Try using Tokenizer().proc_text(s) instead. I have found this works faster for processing small strings (ie for prediction) where the input is small enough that using multiple cores doesn’t really make sense.