Loading a ".npy" extension datastet

(Dipam) #1

Hello everyone, I am just finishing lesson 5 of the course. I have downloaded a medical dataset for image classification. In this dataset, all the files are of .npy extension. How can I load these images using fastai? I googled a bit and it is possible to do it in PyTorch as follows. However, I am not familiar with PyTorch. Can someone give me a small example of how to use the above link to load my dataset?
Thanks in advance.


(Nick Switanek) #2

.npy files are typically NumPy arrays, so they’re not necessarily images, and need to be converted before you can load them as images.

One approach is to do a batch preprocessing step to convert all the .npy files to images and save them as images. Then you can point fastai vision functionality at the image files and you’ll be off and running.

You would walk through the directories containing the .npy files, and for each, do something like the following:

import numpy as np
from PIL import Image

# for npyfilename in filenames:
arr = np.load(npyfilename)

# assuming arr.shape is (W,H,C) for Width and Height in pixels, and C channels (such as 3 for RGB)
# also assuming that values in each array position are in range(0, 256) - if not, see PIL's convert modes
# if these assumptions don't hold, you need to first reshape and normalize
im = Image.fromarray(arr)
pngfilename = npyfilename.replace('.npy', '.png')

Alternatively, you can implement a custom ItemList class and related methods.


(Fred Monroe) #3

Not claiming this is the greatest code but it worked for me

class NpyRawImageList(ImageList):
    def open(self, fn):
        img_data = np.load(fn)
        return Image(tensor(img_data[None]))

    def analyze_pred(self, pred):
        return pred[0:1]

    def reconstruct(self, t):
        return Image(t.float().clamp(min=0, max=1))

    def from_folder(cls, path:PathOrStr='.', extensions:Collection[str]=None, **kwargs)->ItemList:
        extensions = ifnone(extensions, ['.npy'])
        return super().from_folder(path=path, extensions=extensions, **kwargs)

(Dipam) #4

Thank you for your replies guys @nswitanek and @313V. Unfortunately, the zip file seems corrupt and I can’t get it to work on either Kaggle or Colab. I’m trying. I’ll post updates as soon as possible.