I’ve a large amount of data in multiple text files, I use spacy
tokenizer on this data to tokenize it and use pickle.dump
to dump the tokens list (it is list
that I’m dumping essentially) in a file.
After this my code has to read this dumped data, make batches and dump into bs
number of files. How do I however, serially load the list of tokens using pickle to batch them without overloading my memory? Like I want pickle to give me chunks of my list
so that it doesn’t overload my memory.
Thanks a lot