probably a simple question. When downloading the MNIST dataset instead of using untar_data it comes as a CSV file.
A row in the CSV contains the pixel values for one image. The first value in a row is an exception as it is the label corresponding to the row.
My question: How can I get data which is stored in that way into a DataBunch?
I imagine there would have to be a way to access rows/columns of the CSV individually. Especially since it seems to be fairly common to store data in such a format.
Using ImageDataBunch.from_csv() doesn’t work. The dataloader for the labels seems to expect another CSV instead of a single value out of the same CSV.
So far I have been stumped. Any kind of hints would be greatly appreciated.
I’m not sure exactly what the format you’re describing is, but if you need more flexibility than from_csv can provide you might want to import it using pandas into a dataframe and then use ImageDataBunch.from_df to get it into your databunch.
I also found the function ImageDataBunch.from_lists() which might work.
It’s very flexible but also seems to require more manual work than other functions.
Thanks for your answer. I have also been trying around with dataframes. Ultimately I came to the conclusion though that it doesn’t seem possible to use the CSV directly. At least not if you want to use the functions given to you by fastai.