yes and no. I managed to create the ImageDataBunch after Jeremey’s fix. But when I try to run data.show_batch(rows=3, figsize=(7,6)) it errors with 'KmnistDataset' object has no attribute 'x' on the line if self.train_ds.x._square_show: rows = rows ** 2 (KministDataset) is the custom class that subclasses the Dataset class
Except, you can’t do anything with this databunch object. 'TensorDataset' object has no attribute 'c'
And your code is missing:
from sklearn.model_selection import train_test_split
as @sguggersaid, you have to implement your own Dataset class. It appears from this thread that a new method from_array() would be a useful addition to the fastai library.
data = (ItemList.from_array(train_ds = train_array, valid_ds=valid_array, test_ds=test_array), ...)
Here is a way to do it. This will create an in memory image list.
p = untar_data(URLs.MNIST_SAMPLE)
train = p/'train'
imagesl = ImageList.from_folder(train)
images = []
for i in imagesl:
images.append(i.data.numpy())
images = np.array(images)
class MyImageList(ImageList):
def open(self, i):
return i
@staticmethod
def from_numpy(arr):
items = []
for i in arr: items.append(Image(torch.from_numpy(i)))
return MyImageList(items)
MyImageList.from_numpy(images)
Anyone have any success with creating databunch or dataset from numpy arrays? I’ve read through a lot of the forum posts and either code no longer works or it doesn’t apply? Any code samples of how to create a databunch from
If you want your arrays to be used as input for a cnn they will need a channel dimension. I assume these are one channel images, so just reshape with x.reshape(50,1,224,224)
However you’ll still have a problem. The fastai vision models expect 3 input channels. One potential answer is to just copy the one channel three times – x.reshape(50,1,224,224).repeat(3,1). How well this will work with transfer learning depends on the dataset.
I’d also amend my previous custom class to subclass ImageList instead, and implement a custom get method to turn the arrays into fastai Images. This is so methods like show_batch work.
class ArrayImageList(ImageList):
@classmethod
def from_numpy(cls, numpy_array):
return cls(items=numpy_array)
def label_from_array(self, array, label_cls=None, **kwargs):
return self._label_from_list(array,label_cls=label_cls,**kwargs)
def get(self, i):
n = self.items[i]
n = torch.tensor(n)
return Image(n)
I have stumbled upon a problem though (most likely due to me not knowing much). I am trying to hold out 20% of training data for validation purpose and struggling to have it correctly labeled.
data = (ArrayImageList.from_numpy(training_images)
.split_subsets(train_size=0.8, valid_size=0.2)
.label_from_array(training_labels)
.databunch(bs=10))
Could you (or anyone else) give me a hint how can I correctly split (re-assign?) labels for training and validation data?