What to do when preds value are too high?

tsadarsh · January 20, 2025, 2:56pm

Hi everyone. I am trying to train a linear classifier model to classify Cats and Dogs. This is the dataset that I am using. Here is my notebook on Kaggle.

I have scaled each image in the dataset to a tensor size [3, 250, 250]. When I train the model for 20 epochs with a learning rate of 10, I start with an accuracy of 0.4141 and with similar accuracy or worse. I tried changing the learning rate to [0.1, 0.5, 10, 100]. Smaller learning rate perform better and the accuracy improves as I increase the epochs but the rate of convergence is very low.

Here is the accuracy values for lr = 0.1:
0.4682 0.4765 0.4838 0.4917 0.4919 0.4918 0.4801 0.4723 0.471 0.4708 0.4642 0.4591 0.4565 0.4574 0.456 0.453 0.4525 0.4524 0.4514 0.4494

My guess is that the prediction values are too high. So when I apply .sigmoid() on them they get ceiled or floored to 1.0 or 0.0 respectively. I printed the values of raw prediction values and they were in the range of approximately [-1000, 1000]. So I tried scaling them down by dividing the value of predictions by 100 before applying the sigmoid function. I am not sure if that is the right way to do.

predictions = (predictions/100).sigmoid()

My code is adapted from the Chapter 4 MNIST Basics. I would appreciate if anyone could go through my notebook and help me understand why model training is not converging? Thank you!

david598scott · January 23, 2025, 11:07am

Hello,
It sounds like your learning rate might be too high for the model to converge properly. Try using a lower learning rate, such as 0.01 or 0.001, and see if the accuracy improves more steadily. Also, consider normalizing your dataset to bring values within a smaller range, which can help with the stability of the sigmoid function. Finally, ensure your data preprocessing aligns with the expected input for your model. WhataburgerVisit com

Best Regards,
David Scott