HELP: READ LMDB dataset in Fastai

Founty · March 18, 2020, 7:27am

I’d like to read lmdb format dataset. Anybody tell me how to use lmdb dataset just like Caffe in Fastai?, should I customize the dataset or there are some interfaces in FastAI? Thank you!

marii · March 18, 2020, 7:35am

There are lessons specifically on this dataset. Have you been through one of the previous lesson 1s?

Founty · March 18, 2020, 7:48am

Sorry, I have read the lessons in github and documents in https://docs.fast.ai/, but I cannot find any topic about LMDB dataset. And I also asked Google, while only IMDb rather than LMDB is found.

marii · March 18, 2020, 7:50am

Oh sorry I just assumed you had typo’d. Any link to the LMDB dataset?

Founty · March 18, 2020, 7:52am

LMDB database, https://lmdb.readthedocs.io/en/release/, which is widely used in Caffe

marii · March 18, 2020, 7:58am

Oh! Database not dataset. Using a custom Dataset in fastai: https://docs.fast.ai/basic_data.html#Using-a-custom-Dataset-in-fastai

No easy support from lmdb from my understanding.

Founty · March 18, 2020, 7:59am

thanks a loooot.

Margolis · March 18, 2020, 10:43am

It looks like you’ll need fastai’s data_block API https://docs.fast.ai/data_block.html along with LMDB’s Python wrapper.

I’d say worse case scenario is that you run each step by pulling a batch of data from LMDB, converting it into a DataFrame, and then feeding that into fastai to train the model. Run enough steps and you’ve got an epoch, and enough epochs will in theory give you a trained model.

Margolis · March 29, 2020, 8:46pm

If there’s an academic faculty member interested in this problem, I recently saw a Facebook award that would well-suited for this problem. fastai is built on top of Facebook’s PyTorch, and given the amount of data the FB has, I suspect that they would be very interested in machine learning with high-performance databases. https://research.fb.com/programs/research-awards/proposals/2020-networking-request-for-proposals/