I’ve been messing for a while with the Digit Recognizer dataset. I see that most approaches proceed by loading the image data from the csv, then dumping that into all the individual training and validation images in disk, and going on from there via ImageClassifierData.from_paths.
I thought I might try using the data directly via ImageClassifierData.from_arrays, which would also give me the chance to dig a little into numpy itself.
So after some tweaking here and there, I finally was able to get a first learn.fit call to work (or to be more accurate, just not to throw an error). What I haven’t found is a way to apply data augmentation. So far I’ve just tried the “transforms_side_on” variant (although I am aware that not all image manipulations - side-flips for instance - in there would apply for digit recognition) just to see if that would work, but I just got an error from learn.fit.
Could anybody suggest how I could add data augmentation on this?
Jose, in your notebook comments you say: “We repeat the only channel so we will have 3 channels.” The MNIST data is black and white (i.e. one channel). I don’t really understand what you are you trying to achieve by converting this to three channels.
Also, I believe that it’s standard practice to apply some kind of transformation to this kind of data. Typically, people would employ “normalisation”, dividing the train (and test) values by 255, to place everything on a scale between 0 and 1. Note that the pixel values are all originally between 0 (white) and 255 (black).
I hope that helps get you on track, but if not you could always check out the many excellent Kaggle kernels on this very popular dataset.
I had to 3x the channel just to get the correct shape for the data, otherwise I would get an error either at “ImageClassifierData.from_arrays” or at “learn.fit” (I don’t remember now where that was exactly).
I checked the kaggle kernels, as you suggested, and found this one that also uses the fastai libraries via the from_array approach, I will dig into that.
Stefan — you did a really nice job with your notebook. Thank you for sharing that. Going back to Jose’s original question about possibly using data augmentation, I took your notebook and experimented with simply normalizing the data and then adding a max_zoom like this:
Without any other changes, my results improved from 0.99357 (validation)/0.99385 (test) to 0.994167 (validation)/0.99428 (test).
There’s obviously a random component to these results, but it does appear that data augmentation may possibly be effective for the MNIST dataset, even without the constraints imposed by the fact that it wouldn’t seem to make sense to rotate a handwritten digit. Naturally, there are other things that could be tried as week, either changing the size of max_zoom, changing the probability that an image will be selected for zoom, or perhaps trying something else like RandomLighting or RandomBlur.