Michael,
Thanks for the reply.
I’m doing that already.
The issue here is to load the 60 GB corpus included with 120K files.
I need to load that corpus to create the tokens and do the conversion to training data.
That’s just the corpus.
After that I will need to do the other calculations.
There should be a way to have RAM/disk object (like a database) but with the numpy/pandas capabilities and accessibility.