DataBunch from numpy arrays

You’re right – the custom class needs some work to support splitting. We’ll borrow from the tabular code and stash the array in inner_df.

class ArrayImageList(ImageList):
    @classmethod
    def from_numpy(cls, numpy_array):
        return cls(items=range(len(numpy_array)),inner_df=numpy_array)
    
    def label_from_array(self, array, label_cls=None, **kwargs):
        return self._label_from_list(array[self.items.astype(np.int)],label_cls=label_cls,**kwargs)
    
    def get(self, i):
        n = self.inner_df[i]
        n = torch.tensor(n).float()
        return Image(n)

You should then be able to call the usual split functions.

5 Likes

Thanks! You are legend!

Very helpful thread. Another 0.02 cents:

class ArrayImageDataBunch(ImageDataBunch):
    @classmethod
    def from_numpy(cls, numpy_array, labels, train_size=0.8, valid_size=0.2, **kwargs:Any)->'ImageDataBunch':
        "Create from numpy array"
        src = (ArrayImageList.from_numpy(numpy_array)
        .split_subsets(train_size=train_size, valid_size=valid_size)
        .label_from_array(labels))
        return cls.create_from_ll(src, **kwargs)

Hi noachr,

Thank you so much for posting this code. This has helped me so much. I’m trying to adapt this code for use in an image segmentation project. Do you have any advice on changing the label_from_array code to take in an equal size label array as the input array?

I’d appreciate any help that anyone can give, hopefully resurrecting an old thread isn’t a problem!

Yet another 0.02 cent solution.
I did something like this, and it appears to be working.
The numpy arrays are 2D with no channel information.

   class ImageListNumpy(ImageList):
     def get(self, i):
        fn = ItemList.get(self,i)
        d = np.load(fn) .astype('float32')
        d = d[None, :]
        dd = np.repeat(d,3,axis=0)
        return dd

d1 = ImageListNumpy.from_df(df, path = '.',cols='numpydesign')
d2 = d1.split_by_rand_pct()

Hi ! I’m very interested in knowing if fastai has already a builtin function that do this or similar:

I’m loading 16 bits png chest xrays images. When calling dls.show_batch() all images looks white, which I think is due to the function is not normalizing correctly the values from the 16 bit images [0, 65535]. I don’t care that much about displaying but if the model is getting the images as I want to (16 bits [0, 65535] but normalize in the range [0, 1]). I’ve read that Pytorch does normalize in the range [0, 1] all images (8 and 16 bits ones).

1 Like