I was experimenting with a kaggle dataset (whale-categorization-playground).
The dataset has a large number of classes (4251) representing whale tail-fin images.
An exercise using fastai V1.0.21 and resnet101 image size 128 got me a model that trained well and reached an accuracy of apprx 0.58 after 40 epochs.
epoch | train_loss | valid_loss | accuracy |
---|---|---|---|
37 | 1.718895 | 3.353173 | 0.581849 |
38 | 1.629597 | 3.378948 | 0.580153 |
39 | 1.598755 | 3.379822 | 0.585454 |
40 | 1.562429 | 3.398566 | 0.585030 |
41 | 1.555568 | 3.449480 | 0.575064 |
When I increased the image size to 288 (and reduced the batch size for the health of my 4 GB gpu). loaded the model from previous run and trained for 40 epochs overnight. The model refused to learn and accuracy languished at 0.038 ???
Total time: 5:10:14
epoch | train_loss | valid_loss | accuracy |
---|---|---|---|
37 | 7.481616 | 20.684521 | 0.036472 |
38 | 7.454990 | 15.015072 | 0.039016 |
39 | 7.587584 | 20.919699 | 0.038380 |
40 | 7.223733 | 18.816559 | 0.038592 |
Any ideas why this could be so ??