Fashion mnist from array

haleemer · July 31, 2019, 6:13pm

I am at lesson 3 and trying to get familiar with the data bunch module. I wanted to import the kaggle fashion mnist dataset. I have been able to do so sucessfully in colab (https://colab.research.google.com/drive/1A4BKE0Y4_FdGjsF07H6x79bU-V8UmwcS)

However, after splitting the csv into dataframes of train/valid/test along with their respective label dataframes, I don’t know how to proceed. Previously there was a way to get a databunch from image arrays however now if I put ??ImageClassifierData.from_arrays()) it gives me an error.

Anyone have an updated tutorial on how to get databunch from numpy arrays?

eljas1 · July 31, 2019, 6:23pm

In the Kaggle competition the images are formatted differently than in Fast.AI version of the dataset which is used in the lessons. To replicate the lesson, you first need to reshape the dataframe columns to a 28x28 shape, since the original dataframe has a dedicated column for each pixel and therefore can’t be directly read as an image.

haleemer · July 31, 2019, 7:10pm

Hi, thanks for your response. Yes I have reshaped the columns into 28x28 shape. What function do I use to make a databunch from that?

eljas1 · August 1, 2019, 7:38am

I think Jeremy prefers using the Datablock API over the direct Databunch method since it is more clear to whats going on. To create the databunch with datablock API, you can follow the notebook in lesson 7. Here’s the combined code for creating it:

data = (ImageList.from_folder(path, convert_mode='L')
        .split_by_folder(train='training', valid='testing')
        .label_from_folder()
        .transform(tfms)
        .databunch(bs=bs)
        .normalize())

Here’s the notebook:

github.com

fastai/course-v3/blob/master/nbs/dl1/lesson7-resnet-mnist.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MNIST CNN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },

This file has been truncated. show original