Hello!

I’m doing one of the challenges in chapter 4 of the fastbook in which a digit classifier has to be made using the MNIST dataset.

I’m a bit confused how I should go about creating this classifier. There will obviously be 9 different labels, but I’m not quite sure how I should go about assigning the labels. Below are some ideas of mine.

The first idea I have is essentially k-nearest neighbors. I calculate the linear combination for all images in the training set, calculate the linear combination for the input image, and then use the majority label among k different training samples whose linear combinations are closest to the input image. The problem I have with this is that I don’t quite see how the system could improve itself overtime with this approach, and how I could add nonlinearity to this. However, I suppose this approach could be useful as a baseline.

The second idea I have is to use the softmax function. I’ve never used it, but know what it is: it’s an extension of the sigmoid function for multiclass classification. Where the sigmoid function can classify two labels (is an input larger than or less than zero), the softmax function can classify to multiple labels (is a value in between two set values).

However, I’m not quite sure how to set the intervals for which to classify a digit (e.g., using arbitrary values: a value between 0.0 and 0.1 is classified as a 0, a value between 0.1 and 0.2 is classified as a 1, etc.). One idea I have is that I could calculate the average linear combination of all digits respectively in the training set, then pass it through the sigmoid function:

```
{0: tensor([0.4247]),
1: tensor([0.4462]),
2: tensor([0.4960]),
3: tensor([0.5454]),
4: tensor([0.4937]),
5: tensor([0.5575]),
6: tensor([0.5382]),
7: tensor([0.4850]),
8: tensor([0.4669]),
9: tensor([0.5285])}
```

Then I could set the intervals with these values (e.g., a 0 label is given to any combination between 0.4247 and 0.4462; a 9 label is given to any combination between 0.5285 and 0.5382; etc.). Though I don’t think this would work because these are only average combinations and there would definitely be, for example, a 9 image input, and after passing the combination through a sigmoid, would be below 0.5285.

I don’t have any other ideas at the moment for how to approach classifying the inputs.

I would really appreciate any pointers, tips, and other ideas to approach this! I am kind of confused/lost.