Experiments on using Redis as DataSet for fast.ai for huge datasets

marcmuc · December 6, 2018, 9:31am

Hi Vitaliy,
I think your approach is very interesting. One other usecase of working with redis would be storing the reference lists / labels also in redis, thereby maybe solving problems as discussed in this thread:
https://forums.fast.ai/t/runtimeerror-dataloader-worker-is-killed-by-signal/31277?u=marcmuc

The huge advantage of using redis vs. pandas dfs (your Alternative 2) is that redis would store the data in an outside “central” process and managing the memory (even if a little slower), whereas with pandas dfs all the data would be “copied” into the worker processes leading to problems as discussed in said thread.