Hi!

I am doing a Learner for MNIST from scratch. The problem I am facing is that a single-layer model performs **much** better than a multi-layer model. With a single layer, I was able to achieve 0.63 correct predictions after 500 epochs with **L1 norm** as a loss function and learning rates 0.1 - 0.01. However, the multilayer model gets at most 0.2 accuracies even after thousands of epochs. I was modifying the learning rates and sizes of the second linear layer, but the result didn’t get better.

```
class LinearModel:
def __init__(self, w, b):
self.w = torch.rand(w).requires_grad_()
self.b = torch.rand(b).requires_grad_()
def params(self):
return (self.w, self.b)
def predict(self, xb):
return xb@self.w + self.b
class RectifiedLinearUnit:
def predict(self, xb):
return xb.max(tensor(0.0))
def params(self):
return []
class NeuralNetwork:
def __init__(self):
self.layers = [
LinearModel((28 * 28, 100), 10),
RectifiedLinearUnit(),
LinearModel((100, 10), 10),
]
def params(self):
l = [layer.params() for layer in self.layers]
return [p for layer in l for p in layer]
def predict(self, xb):
res = xb
for layer in self.layers:
res = layer.predict(res)
return res
```

I thought it is something wrong with my model and tried build in versions

```
torch.nn.Sequential(
torch.nn.Linear(28*28,10)
)
and
torch.nn.Sequential(
torch.nn.Linear(28*28,10),
torch.nn.ReLU(),
torch.nn.Linear(10,10)
)
```

But got the same result.

As I understood, multi layer model should perform better and require less training but for me it is absolutely the opposite. Could someone please help me to understand why this happens?