In Lesson-1 we are using resnet34. But I noticed that there were few more options available.
I did try the complete steps on resnet18, I found that results from resnet34 and resnet18 were almost similar. Just curios to know why we are using resnet34? I’ve not tride out the other options yet. But before doing that want to understand the rationale behind showing the demo with resnet34.
I wouldn’t read too much into the arch choice. In general it is a good idea to try out a couple and see how they perform.
For me, resnet34 would be one of the first architectures if not the first I would try out - it seems to offer very good performance vs it size (which impacts training time) and allows for bigger batch size.
As a rule of thumb the more complex the problem, the bigger an arch you might need. All of the archs from your screenshot have been pretrained on imagenet and in general telling a cat from a dog is probably not the hardest of tasks for a CNN hence going for something relatively small seems to make a lot of sense.
Thanks for your response. I’m still not completely convinced. I think just batch size or size is any constraints, I was able to go to batch size beyound 64K in both resnet18 and resnet34. Here is the sample data for resnet34, which I gathered to benchmark my system’s performance.
Batch Size
trn_loss
val_loss
Accuracy
Wall Time (seconds)
64
0.031134
0.028481
0.989
15.7
128
0.028619
0.029348
0.989
14.1
256
0.032689
0.022995
0.991
13.2
512
0.038162
0.025427
0.9895
12.7
1024
0.055639
0.02597
0.988
12.2
2048
0.08693
0.034631
0.987
11.4
4096
0.165338
0.048062
0.983
11.5
8192
0.303578
0.060767
0.9795
10.1
16384
0.346356
0.091748
0.98
6.15
32768
0.651255
0.262653
0.927
4.66
65536
0.676977
0.250999
0.9475
4.74
131072
0.56841
0.24005
0.9415
4.73
Any inputs would certainly help. What does 34/18/101/xyz numbers signify?
The 34 is how many layers are in the network. Resnet34 shown here https://i.imgur.com/nyYh5xH.jpg. Larger networks can model more complex problems, but at the risk of over fitting. You would need more regularization for larger networks. The reason why resnet34 is used is because its performance to accuracy trade off is fine for the problem. Larger networks would take longer to train, and use more memory than smaller networks.
There’s no point using those enormous batch sizes. Stick with 128 to keep it simple. You should be running through all the steps - only do at most 2 epochs with precompute, then a few epochs unfreezing the different layer groups, like in the lesson 1 notebook. And try TTA too. See what the best accuracy you can get is with each architecture.
Thanks @jeremy, I will take your advice and continue. Will post my results about the observations I’ll be making with TTA on different resnet shortly.
I built my own DL setup and hence was doing the testing on how well my system coupes up with different batch sizes. In some threads in this fourm people were discussing about batch size to determine the capacity or performance of their system. So I was experimenting with batch sizes, to get the max load limit of my system.