No convergence, no overfitting (Resnet152)

I wan to train a Resnet152 based binary classifier, using 20k images (10k in each category). My original images, are 2048x2048, however, their full size do not fit into GPU memory (RTX 2080Ti), so I’m using rescaling.

As suggested by Jeremy, I first fine-tunef Resnet152 on a smaller resolution (512), I achieved a very high accuracy of 0.9803, however, when I wanted to use max possible size, that fits the GPU memory (1536 pixels), I can’t achieve convergence nor overfitting and thus very puzzled by this behaviour. Any recommendations will be highly appreciated.

Here is the snippet of my code:

data = ImageDataBunch.from_df(path_to_images, df_images, ds_tfms=get_transforms(do_flip=True, flip_vert=True, max_rotate=None, max_zoom=1.0, max_warp=None), valid_pct=VALID_PCT, bs=16, size=512).normalize(imagenet_stats)

learn_512 = cnn_learner(data, models.resnet152, metrics=accuracy, bn_final=True, ps=0.5)
learn_512 .model = torch.nn.DataParallel(learn_512 .model)

learn_512 .fit_one_cycle(5, max_lr=1e-2)

epoch train_loss valid_loss accuracy time
0 0.388392 0.292082 0.895884 07:37
1 0.354566 0.285786 0.944007 07:14
2 0.306108 0.237093 0.971852 07:26
3 0.255985 0.582608 0.971247 06:53
4 0.219943 0.210191 0.979722 07:06

learn_512 .unfreeze()
learn_512 .fit_one_cycle(1, slice(1e-6, 5e-3))

epoch train_loss valid_loss accuracy time
0 0.206563 0.474366 0.980327 08:53

learn_512.save(‘clf_512’)

The same, but now with 1536 resolution:

data = ImageDataBunch.from_df(path_to_images, df_images, ds_tfms=get_transforms(do_flip=True, flip_vert=True, max_rotate=None, max_zoom=1.0, max_warp=None), valid_pct=VALID_PCT, bs=16, size=1536).normalize(imagenet_stats)

learn = cnn_learner(data, models.resnet152, metrics=accuracy, bn_final=True, ps=0.5).load(‘clf_512’)
learn.to_fp16()
learn.model = torch.nn.DataParallel(learn.model)

learn.fit_one_cycle(5, max_lr=3e-4)

epoch train_loss valid_loss accuracy time
0 0.596490 0.486517 0.862591 43:44
1 0.604683 0.507021 0.858656 43:34
2 0.601637 0.570700 0.836562 43:33
3 0.613324 0.517586 0.862288 44:10
4 0.587535 0.480734 0.878632 44:07

learn.unfreeze()
learn.fit_one_cycle(10, slice(1e-8, 3e-5))

epoch train_loss valid_loss accuracy time
0 0.616416 0.470562 0.897397 53:42
1 0.582632 0.473497 0.886501 53:33
2 0.590366 0.495948 0.885593 53:26
3 0.581906 0.475689 0.868341 53:56
4 0.563544 0.494985 0.858353 53:35
5 0.585111 0.491389 0.887107 53:32
6 0.589545 0.460681 0.895278 53:36
7 0.591109 0.535952 0.859262 53:52
8 0.593550 0.485988 0.883475 53:23
9 0.570294 0.468092 0.877421 53:22

Can anyone explain these results? It seems like the models is underfitting, but when I remove droput, and augmentation, it does not help either. Thanks!

1 Like

Why do you want to use the maximum size? A 2048px pumpkin is no more a pumpkin than a 224px pumpkin. What happens if you reduce the image size to around 224px?

1 Like

Ideally I do want to use the max resolution. These are medical images and pixels (which pixels are important for class decision) are quite important.

To solve the high resolution problem, I thought about slicing the entire image into tiles (2048x2048) into four of 1024x1024 and then these four will have the same label.

Apart from this, any ideas on the behavour of the training procedure outline in above? Surprisingly, the training loss does not reduce below 0.4.

If I reduce to images size to 224px, then I get 0.99 accuracy.

Any thoughts on this behaviour or suggestions how to treat high res images? Thanks!

Are you using progressive resizing here or not? Like training at first on lower res, then on higher res and so on.
Have you tried how the model trained on 224px images does on higher res ones?

Hi Maria,

I started with 256, 512 and 1024 resolution (fractions of the original 2048). 256 and 512 demonstrated really good convergence and very high accuracy (0.99), but then at 1024 I can’t go above 0.93. Jeremy mentioned, that using high res images, should give you better accuracy, however, I’m facing the opposite.

Is there a big difference between the native ImageNet 224 res and 256? I’m currently training on 224 and will post shortly my results here. Thanks.

1 Like

That’s really interesting :smiley:
I’d try progressive resizing with different drop out, like start with 0.0 drop out rate for low res and go up with it when images get bigger. I’ve got pretty interesting results with that, on totally different problem, but still that’s something you may try and also mixups.

What kind of images are you classifying?

1 Like

Many thanks for your tips! Will implement them and let you know.

Also, a few technical questions:

  • My images have only one channel (not RGB), however the standard input should be 3 channel. ImageDataBunch from a single channel image creates a three-channel image (duplicating three times the original image), which obviously consumes more memory. Is there a simple way to feed a single channel images into Resnet?
  • Any ideas how to work with high resolution images? Like tiling or other methods, intrinsic to ItemList data loader for example.

Thanks!

P.S. These are biological samples, proteins, neurons and cells.

To me it seems that you are getting worse because of floating point accuracy. Assuming you shared the complete code here, you trained the 512px images using fp32 but for 1536px you dropped it to fp16. That might be the reason the accuracy is dropping. However, if you want to use fp32, you might need to lower the batch size.

1 Like

Then perhaps you should be treating the task as pixel-level class classification. Ie segmentation not binary classification. Though 99% is a stonking accuracy, and if it goes up when you reduce the size, it suggests not.

Are your classes balanced?

How are you building your validation set?

What happens to accuracy when you run k-folds cross validation?

What does the confusion matrix say? False negatives? Frequently in med tasks, this matters as much as accuracy.

Why do you want to use r152? What do other classifiers like r34, densenet 201 etc achieve?

If accuracy of 99% isn’t success, what is?

3 Likes

Thanks, I’ll check this. However, switching to fp16 is something new, which I wanted to test. I had the same behaviour with fp32().

Then perhaps you should be treating the task as pixel-level class classification. Ie segmentation not binary classification. Though 99% is a stonking accuracy, and if it goes up when you reduce the size, it suggests not.

This is what ideally we would like to do, however we don’t have pixel-wise information (labels for each images). I have a collection of images with a single label for each. Can one do pixel-level segmentation with global labels?

Are your classes balanced?

yes, by the experimental design the number of classes are always balanced.

How are you building your validation set?

I’m using fastai intrinsic ImageData block. I have a directory with all images and their corresponding labels (as a .csv file). Normally, the validation set is 20% random sample of the entire.

What happens to accuracy when you run k-folds cross validation?

Interesting, haven’t checked that. Will do.

What does the confusion matrix say? False negatives? Frequently in med tasks, this matters as much as accuracy.

Capture

Please see attached. Since the classes are balanced, then it’s fine to optimise accuracy.

Why do you want to use r152? What do other classifiers like r34, densenet 201 etc achieve?

I’ve started with Resnet18, 34, etc and saw increasing accuracy with deeper architectures. Densenet201 give the same result however. I am planning to run more examples.

If accuracy of 99% isn’t success, what is?

99% will be great! I need to get a very accurate tool to estimate the morphological structure of cells (binary classes). It’s like cat/dog detector, higher accuracy - better.

So when you went from 512 to 1024 do you trained the model from scratch or did you use the same model that was trained on 512?

1 Like