Hi everyone. I am trying to train a linear classifier model to classify Cats and Dogs. This is the dataset that I am using. Here is my notebook on Kaggle.
I have scaled each image in the dataset to a tensor size [3, 250, 250]. When I train the model for 20 epochs with a learning rate of 10, I start with an accuracy of 0.4141 and with similar accuracy or worse. I tried changing the learning rate to [0.1, 0.5, 10, 100]. Smaller learning rate perform better and the accuracy improves as I increase the epochs but the rate of convergence is very low.
Here is the accuracy values for lr = 0.1:
0.4682 0.4765 0.4838 0.4917 0.4919 0.4918 0.4801 0.4723 0.471 0.4708 0.4642 0.4591 0.4565 0.4574 0.456 0.453 0.4525 0.4524 0.4514 0.4494
My guess is that the prediction values are too high. So when I apply .sigmoid()
on them they get ceiled or floored to 1.0 or 0.0 respectively. I printed the values of raw prediction values and they were in the range of approximately [-1000, 1000]. So I tried scaling them down by dividing the value of predictions by 100 before applying the sigmoid function. I am not sure if that is the right way to do.
predictions = (predictions/100).sigmoid()
My code is adapted from the Chapter 4 MNIST Basics. I would appreciate if anyone could go through my notebook and help me understand why model training is not converging? Thank you!