This answer may be a year late but I’ve just reached this step and spent 2 days trying to figure this (blame my lack of foundation on the amount of time it took )

So here goes

We need to do a matrix multiplication which in Python is done using @ operator. In the example above we’re not using the @ operator and just a simple multiplication hence we need to additionally transpose. If you reviewed khan academy matrix multiplication topic you would know that the objective is to get a dot product in a matrix multiplication.

So when you simply multiply
train_x[0] which is shape 784
with
weights which is shape [784,1]

You are going to get a different result than multiplying with
weights.T which becomes shape [1, 784]

In my understanding -
In weights.T each pixel of input image of train_x[0] is multiplied with each value of Weights.

In weights each pixel of input image of train_x[0] is going to be multiplied with all the weights which is not the result we want.

Not the same results. Hence we use transpose if we are Not using matrix multiplication so we can get the same result which is a dot product.