I’ve a large amount of data in multiple text files, I use
spacy tokenizer on this data to tokenize it and use
pickle.dump to dump the tokens list (it is
list that I’m dumping essentially) in a file.
After this my code has to read this dumped data, make batches and dump into
bs number of files. How do I however, serially load the list of tokens using pickle to batch them without overloading my memory? Like I want pickle to give me chunks of my
list so that it doesn’t overload my memory.
Thanks a lot