Should this be moved to another forum?
I’m mainly writing this to see if you agrees with my approach. I focus on getting a custom image loaded, transformed and predicted. This is the part a lot of beginners (myself included) struggle with. I’m using the NN from the MNIST notebook. (~/courses/ml1/lesson4-mnist_sgd.ipynb)
Step 1 Draw a pretty number.
Step 2 Load it.
import PIL
I = np.asarray(PIL.Image.open(path+'4.bmp')) #Depending on the image format you will have to remove two color channels
I.shape
>>>(28, 28)
You can verify it with show(I):
Step 3 Transform and normalize and normalize some more:
four = I.flatten()
type(four),four.shape
>>>(numpy.ndarray, (784,))
#normalize, scale 0-255 to 0.0-1.0
four = (four-min(four))/(max(four)-min(four))
I use the mean and std from the MNIST dataset cor consistency. My ‘number 4’ array is not as close to mean=0, std=1 and I would like, but I don’t know if that is a problem.
four = (four-mean)/std
four.mean(), four.std()
>>>(0.13384368359694745, 1.1896541432197534)
Step 4 Convert the numpy array to a tensor, then a ‘Variable’.
Vfour = V(T(four))
type(Vfour), Vfour.shape #single column
>>>(torch.autograd.variable.Variable, torch.Size([784]))
Vfour = Vfour.resize(1,784)
Vfour.shape
>>>torch.Size([1, 784])
Why transpose/permute/resize? Because the model expects 784 columns, not rows.
Step 5 Predict!
predictFour = net2(Vfour).exp()
predictFour
>>>Variable containing:
Columns 0 to 5
7.8360e-16 1.2982e-25 4.6790e-20 5.4591e-16 1.0000e+00 2.2921e-19
Columns 6 to 9
2.3876e-13 3.3228e-15 1.3799e-14 4.0984e-10
[torch.cuda.FloatTensor of size 1x10 (GPU 0)]
predictFour.max(1)[1]
>>>Variable containing:
4
[torch.cuda.LongTensor of size 1 (GPU 0)]
Thats what I did. I would love feedback on my approach.