Larger than RAM numpy datasets

I’ve often found that some of my datasets don’t fit in memory. I normally use numpy arrays with time series data. In those cases you can either find a way to increase your RAM, or find a way to read your data from disk ‘on the fly’ (similarly to how it’s done with images).

I’ve investigated a bit and found something that works great for me: np.memmap. This is a np.ndarray subclass that creates a memory map to data stored on disk, so you can almost use it as it data were in memory.

If you are interested in learning more about this, you may take a look at this notebook. I hope you find it useful.

7 Likes