Creating ImageDataBunch from numpy arrays

beneblau · June 11, 2019, 1:19pm

I had just completed the first 3 computer vision lectures on the latest deep learning part 1 v3 course and am trying to use the fastai library for a competition. I did some extensive feature extraction of the dataset and have x_train, y_train, x_val,y_val as numpy arrays instead of csv/folder/directory. As such, I could not used the factory method can create a ImageDataBunch to train my model. How should I proceed from here?

The dimension of my data are:
x_train -> 973 X 480 X 640 X 1
y_train -> 973 X 1
x_val -> 251 X 480 X 640 X 1
y_val -> 251 X 1

Thanks!

Pomo · June 11, 2019, 7:00pm

Hi Benjamin,

I don’t know the best way to proceed, but I can tell you how I would go ahead.

Study a tutorial on PyTorch DataSet and DataLoader.
Then implement your own two DataSets that extract single sample (x,y) tensor pairs from Training and Validation. Ask me here if you need help.
Make PyTorch DataLoaders from them. An advantage here is that you can load a minibatch, check its contents, and make any corrections in code that you yourself wrote.
Move into fastai with

data = DataBunch(training_generator,validation_generator)
learn = Learner(data, model, loss_func=lossFlat)
(Note that your model should expect one channel images.)

That said, there certainly exists a simple way to do all the above inside the fastai library. Perhaps sgugger will comment. But I find (speaking strictly for me) that when your case is off the default fastai path it’s sometimes easier to roll your own implementation than to figure out the best options, classes, and overrides to use the fastai library. Besides, you will start to understand what fastai is doing behind the scenes, which is both important to know and the major topic of Part 2.