After doing the simple SGD demo, I wanted to get a better grasp of the backprop so decided to
implement a simple network which will recognize a ‘X’ sign in a 3x3 array.
ie if its fed the ‘X’ (below) as input, it should output 1 for all the rest 0.
tensor([[1., 0., 1.], [0., 1., 0.], [1., 0., 1.]])
I am finding that the network does not converge to expected outcome.
I was expecting the learned weights to be like below, but the code does not converge to that.
Appreciate any help or pointers to what needs tweaking? Thanks!
tensor([[1., -1., 1.], [-1., 1.,-1.], [1., -1., 1.]])
#'X' represented as 3x3 x = torch.FloatTensor([[1,0,1],[0,1,0],[1,0,1]]) #count of training samples sample_count = 10 ratio_of_crosses_in_input = 0.33 #Create sample_count no of samples of 3x3 input_ = torch.rand(sample_count, 3, 3) #Create a bias element to append to end of image data bias = torch.ones(sample_count,1) #reshape the image sample from 3x3 => 9x1 to make it a linear 1D array input_1x = input_.reshape(sample_count,9) #Add bias as last element, ie 10th element input_with_bias = torch.cat((input_1x,bias), 1) #Make % of the samples of type 'X' input_with_bias[int(sample_count*(1-ratio_of_crosses_in_input)):,:] = torch.cat((x.reshape(1,9), torch.ones(1)), 0) #Create expected result tensor with all outputs corresponding to 'X' set as 1 expected_y = torch.zeros(sample_count, 1) expected_y[int(sample_count*(1-ratio_of_crosses_in_input)):] = 1 #Create single layer nn, initialized to random layer1 = torch.rand(10) - 0.5 #MSE def mse(y_hat, y): return ((y_hat -y)**2).mean() lr = 0.05 sig = torch.nn.Sigmoid() layer1 = nn.Parameter(layer1) def update(): y_hat = input_with_bias@layer1 y_hat = sig(y_hat) loss = mse( y_hat, expected_y) loss.backward() if i % 5000 == 0: print("Loss:") print (loss) with torch.no_grad(): layer1.sub_(lr * layer1.grad) layer1.grad.zero_() #Update the weights for i in range(50000): update()