Data_Block_API to read numpy Arrays

Hi All !

Data_block_API taught in courses reads images from the folders or from the path mentioned. Is there any way I can use datablock API, to wrap a numpy array of size= (mdim1dim2); m=training samples. as a torch tensor ?

1 Like

Not sure about the Data_block_API or what you want to achieve with it, but to convert numpy array to Tensor just use torch.from_numpy(my_arr) . The resulting Tensor will share the same memory with the numpy array, so no duplication even happen.

I’m not sure if you are looking for something like this, but i managed to load image data previously saved as Tensor (could be numpy array / txt)

class CustomImageItemList(ImageItemList):
    def open(self, fn):
        return Image(torch.load(fn))

lst = CustomImageItemList.from_folder(Path('clean_pt'), '.pt')
2 Likes

This may help:

Subclassing let fast.ai library to deal with torch.

Thanks All. I am currently following this link, that does this in an explicit manner.

1 Like

Adding another example of subclassing Segmentation Lists for reading from numpy arrays in case it is helpful. The only unusual part I had to add was hardcoding the mask reading to assume np.uint8 as the data type (otherwise it was converting it to np.object but I’m not sure why):

# random arrays with 10 observations, 3 channel input data, 5 classes, 224x224 images
x = np.random.rand(10, 3, 224, 224)
y = np.random.randint(1, 5, (10, 1, 224, 224)
class_num = np.max(y) + 1

class ArraySegmentationLabelList(SegmentationLabelList):
    def get(self, i):
        return ImageSegment(torch.from_numpy(self.items[i].astype(np.uint8)).long())
    
class ArraySegmentationItemList(SegmentationItemList):
    _label_cls = ArraySegmentationLabelList
    def get(self, i):
        return Image(torch.from_numpy(self.items[i][1]))
    
    @classmethod
    def from_array(cls, array):
        return cls(list(enumerate(array)))
    
    def label_from_array(self, array, label_cls=None, **kwargs):
        indices, _ = zip(*self.items) 
        labels = array[indices,:,:,:]
        return self._label_from_list(labels, label_cls=label_cls, **kwargs)

Example use:

data = (ArraySegmentationItemList.from_array(x)
    .split_by_rand_pct(0.2)
    .label_from_array(y, classes=list(range(class_num)))
    .databunch()
    .normalize()
)
1 Like