Using (non-image) custom data as with image classifiers?

Are there any complete examples for using user-defined input (i.e. not a bitmap) with Image classifiers?

I have read the existing posts about loading custom data, but I cannot go from that to actually feeding the image classifier with data of correct type, size & dimension.


Example: I want to go from this data

# size,color,type
10x10,red,small-red
30x30,blue,big-blue
15x22,green,medium-green
...

to a 32x32x3 python array which I somehow (and this is the part I have trouble with) translate to a numpy/pytroch primitive of correct size and dimension that a standard image classifier would accept.

So far I have only been able to get this to work by generating intermediate Bitmap files and then using a standard image loader, but for my experiment this is very inefficient.

I’m not sure that I understand your problem. If your data translate to images, I would generate once all bitmaps and store in a folder. From there, I would train like a image classification problem.

However, from the data sample you posted, it seems that you have a csv file. So, I would treat your problem as tabular with 3 columns (size, color, type). See tabular docs. Also, you may split size into to column (height and width)

Finally, I don’t think that a image classifier will work in non image data as it uses convolutions to extract image features.

This was just an example, the actual data is more complex and results in a real bitmap.

Let me rephrase the problem:

  1. ImageDataBunch needs a filename and in this case there is no file since the image is generated on the fly from the data.
  2. It would be trivial to generate all bitmaps and save them to a file but there is a lot of data and the images are very small (around 32x32) so having them as a bitmap on disk or in memory would be very inefficient.
  3. The generated 32x32 matrix is evaluated with different algorithms. To be able to compare it to someone elses work, one of them needs to be an image classifier using deep CNN.

There are also multiple parameters in generating the images, so generating them at runtime has the advantage of that I can play with the parameters.

In this case, I would extend the ImageList class with a custom get method. In there, from the given i, I would generate the bitmap.

Thanks Victor,

ImageList assumes the data contains a filename, a custom get() method will not change that:

items = MyCustomImageList.from_csv('data', 'dataset.csv').split_by_rand_pct(0.2).label_from_df(cols=["type"]) 
...
~/.local/lib/python3.6/site-packages/PIL/Image.py in open(fp, mode)
   2807 
   2808     if filename:
-> 2809         fp = builtins.open(filename, "rb")
   2810         exclusive_fp = True
   2811 

FileNotFoundError: [Errno 2] No such file or directory: 'data/10x10

I also have problem getting the data types and dimensions right. If some could point me to a working example I would be very grateful.

ImageList assumes the data contains a filename in the get method. See https://github.com/fastai/fastai/blob/master/fastai/vision/data.py#L269:

def get(self, i):
    fn = super().get(i) # In your case, returns data/10x10
    res = self.open(fn). # <- Here, instead of opening an image, you need to create it.
    self.sizes[i] = res.size
    return res

If it doesn’t work, you may need a custom ItemType. See https://docs.fast.ai/tutorial.itemlist.html. In this case, you may want to try fastai v2 (it’s in development). A good starting point is [muellerzr course].(A walk with fastai2 - Vision - Study Group and Online Lectures Megathread)

Finally, regarding data types and dimensions, I don’t know. Keep in mind that all images have to be same size in order to collate them in a batch.