No convergence, no overfitting (Resnet152)

klein · July 7, 2019, 8:00pm

I wan to train a Resnet152 based binary classifier, using 20k images (10k in each category). My original images, are 2048x2048, however, their full size do not fit into GPU memory (RTX 2080Ti), so I’m using rescaling.

As suggested by Jeremy, I first fine-tunef Resnet152 on a smaller resolution (512), I achieved a very high accuracy of 0.9803, however, when I wanted to use max possible size, that fits the GPU memory (1536 pixels), I can’t achieve convergence nor overfitting and thus very puzzled by this behaviour. Any recommendations will be highly appreciated.

Here is the snippet of my code:

data = ImageDataBunch.from_df(path_to_images, df_images, ds_tfms=get_transforms(do_flip=True, flip_vert=True, max_rotate=None, max_zoom=1.0, max_warp=None), valid_pct=VALID_PCT, bs=16, size=512).normalize(imagenet_stats)

learn_512 = cnn_learner(data, models.resnet152, metrics=accuracy, bn_final=True, ps=0.5)
learn_512 .model = torch.nn.DataParallel(learn_512 .model)

learn_512 .fit_one_cycle(5, max_lr=1e-2)

epoch	train_loss	valid_loss	accuracy	time
0	0.388392	0.292082	0.895884	07:37
1	0.354566	0.285786	0.944007	07:14
2	0.306108	0.237093	0.971852	07:26
3	0.255985	0.582608	0.971247	06:53
4	0.219943	0.210191	0.979722	07:06

learn_512 .unfreeze()
learn_512 .fit_one_cycle(1, slice(1e-6, 5e-3))

epoch	train_loss	valid_loss	accuracy	time
0	0.206563	0.474366	0.980327	08:53

learn_512.save(‘clf_512’)

The same, but now with 1536 resolution:

data = ImageDataBunch.from_df(path_to_images, df_images, ds_tfms=get_transforms(do_flip=True, flip_vert=True, max_rotate=None, max_zoom=1.0, max_warp=None), valid_pct=VALID_PCT, bs=16, size=1536).normalize(imagenet_stats)

learn = cnn_learner(data, models.resnet152, metrics=accuracy, bn_final=True, ps=0.5).load(‘clf_512’)
learn.to_fp16()
learn.model = torch.nn.DataParallel(learn.model)

learn.fit_one_cycle(5, max_lr=3e-4)

epoch	train_loss	valid_loss	accuracy	time
0	0.596490	0.486517	0.862591	43:44
1	0.604683	0.507021	0.858656	43:34
2	0.601637	0.570700	0.836562	43:33
3	0.613324	0.517586	0.862288	44:10
4	0.587535	0.480734	0.878632	44:07

learn.unfreeze()
learn.fit_one_cycle(10, slice(1e-8, 3e-5))

epoch	train_loss	valid_loss	accuracy	time
0	0.616416	0.470562	0.897397	53:42
1	0.582632	0.473497	0.886501	53:33
2	0.590366	0.495948	0.885593	53:26
3	0.581906	0.475689	0.868341	53:56
4	0.563544	0.494985	0.858353	53:35
5	0.585111	0.491389	0.887107	53:32
6	0.589545	0.460681	0.895278	53:36
7	0.591109	0.535952	0.859262	53:52
8	0.593550	0.485988	0.883475	53:23
9	0.570294	0.468092	0.877421	53:22

Can anyone explain these results? It seems like the models is underfitting, but when I remove droput, and augmentation, it does not help either. Thanks!

digitalspecialists · July 7, 2019, 8:07pm

Why do you want to use the maximum size? A 2048px pumpkin is no more a pumpkin than a 224px pumpkin. What happens if you reduce the image size to around 224px?

klein · July 7, 2019, 8:18pm

Ideally I do want to use the max resolution. These are medical images and pixels (which pixels are important for class decision) are quite important.

To solve the high resolution problem, I thought about slicing the entire image into tiles (2048x2048) into four of 1024x1024 and then these four will have the same label.

Apart from this, any ideas on the behavour of the training procedure outline in above? Surprisingly, the training loss does not reduce below 0.4.

If I reduce to images size to 224px, then I get 0.99 accuracy.

Any thoughts on this behaviour or suggestions how to treat high res images? Thanks!

Blanche · July 7, 2019, 8:56pm

Are you using progressive resizing here or not? Like training at first on lower res, then on higher res and so on.
Have you tried how the model trained on 224px images does on higher res ones?

klein · July 7, 2019, 9:01pm

Hi Maria,

I started with 256, 512 and 1024 resolution (fractions of the original 2048). 256 and 512 demonstrated really good convergence and very high accuracy (0.99), but then at 1024 I can’t go above 0.93. Jeremy mentioned, that using high res images, should give you better accuracy, however, I’m facing the opposite.

Is there a big difference between the native ImageNet 224 res and 256? I’m currently training on 224 and will post shortly my results here. Thanks.

Blanche · July 7, 2019, 9:19pm

That’s really interesting
I’d try progressive resizing with different drop out, like start with 0.0 drop out rate for low res and go up with it when images get bigger. I’ve got pretty interesting results with that, on totally different problem, but still that’s something you may try and also mixups.

What kind of images are you classifying?

klein · July 7, 2019, 9:45pm

Many thanks for your tips! Will implement them and let you know.

Also, a few technical questions:

My images have only one channel (not RGB), however the standard input should be 3 channel. ImageDataBunch from a single channel image creates a three-channel image (duplicating three times the original image), which obviously consumes more memory. Is there a simple way to feed a single channel images into Resnet?
Any ideas how to work with high resolution images? Like tiling or other methods, intrinsic to ItemList data loader for example.

Thanks!

P.S. These are biological samples, proteins, neurons and cells.

pooya_drv · July 8, 2019, 2:46am

To me it seems that you are getting worse because of floating point accuracy. Assuming you shared the complete code here, you trained the 512px images using fp32 but for 1536px you dropped it to fp16. That might be the reason the accuracy is dropping. However, if you want to use fp32, you might need to lower the batch size.

digitalspecialists · July 8, 2019, 5:04am

Then perhaps you should be treating the task as pixel-level class classification. Ie segmentation not binary classification. Though 99% is a stonking accuracy, and if it goes up when you reduce the size, it suggests not.

Are your classes balanced?

How are you building your validation set?

What happens to accuracy when you run k-folds cross validation?

What does the confusion matrix say? False negatives? Frequently in med tasks, this matters as much as accuracy.

Why do you want to use r152? What do other classifiers like r34, densenet 201 etc achieve?

If accuracy of 99% isn’t success, what is?

klein · July 9, 2019, 2:24pm

Thanks, I’ll check this. However, switching to fp16 is something new, which I wanted to test. I had the same behaviour with fp32().

klein · July 9, 2019, 2:34pm

Then perhaps you should be treating the task as pixel-level class classification. Ie segmentation not binary classification. Though 99% is a stonking accuracy, and if it goes up when you reduce the size, it suggests not.

This is what ideally we would like to do, however we don’t have pixel-wise information (labels for each images). I have a collection of images with a single label for each. Can one do pixel-level segmentation with global labels?

Are your classes balanced?

yes, by the experimental design the number of classes are always balanced.

How are you building your validation set?

I’m using fastai intrinsic ImageData block. I have a directory with all images and their corresponding labels (as a .csv file). Normally, the validation set is 20% random sample of the entire.

What happens to accuracy when you run k-folds cross validation?

Interesting, haven’t checked that. Will do.

What does the confusion matrix say? False negatives? Frequently in med tasks, this matters as much as accuracy.

Capture

Please see attached. Since the classes are balanced, then it’s fine to optimise accuracy.

Why do you want to use r152? What do other classifiers like r34, densenet 201 etc achieve?

I’ve started with Resnet18, 34, etc and saw increasing accuracy with deeper architectures. Densenet201 give the same result however. I am planning to run more examples.

If accuracy of 99% isn’t success, what is?

99% will be great! I need to get a very accurate tool to estimate the morphological structure of cells (binary classes). It’s like cat/dog detector, higher accuracy - better.

pooya_drv · July 10, 2019, 8:42am

So when you went from 512 to 1024 do you trained the model from scratch or did you use the same model that was trained on 512?