I’ve come across a lot of datasets that are .mat files but haven’t seen a way of importing them with the library in the docs. I’ve managed to import it using:
import numpy as np
import h5py
f = h5py.File('nyu_depth_v2_labeled.mat', 'r')
print(f.keys())
imageData = np.array(f.get('images'))
print(imageData.shape)
depthData = np.array(f.get('depths'))
print(depthData.shape)
As far as I know .mat is a format built for MATLAB. If i remember those correctly, you can basically put in anything you want into that (functions, arrays, variables, all with arbitrary naming). Meaning there is not the one way to store for example images in that.
Thus, there is not best way to deal with them directly I guess. The only thing you can do is read them, figure out what kind of data is in there, and save it in a more convenient way to use it with fastai.
I’m using data stored as .mat for image segmentation like this:
import scipy.io as sio
def open_mat(fn, *args, **kwargs):
data = sio.loadmat(fn)
data = np.array([data[r] for r in ['band1', 'band2', 'band3']])
data = torch.from_numpy(data).float()
return Image(data)
def open_mask(fn, *args, **kwargs):
data = sio.loadmat(fn)['mask']
data = torch.from_numpy(data).float()
return Image(data.view(-1, data.size()[0], data.size()[1]))
class SegLabelListCustom(SegmentationLabelList):
def open(self, fn): return open_mask(fn, div=True)
class SegItemListCustom(ImageList):
_label_cls = SegLabelListCustom
def open(self, fn): return open_mat(fn)
I hope this helps
Edit: In my dataset I have one sample for each .mat file, if you have multiple images in the same file maybe the easiest way is to save them separately in another format as suggested above.