Paper on ResNet34, VGG and various other architectures

I was wondering why Jeremy picked resnet34… this paper shows resnet34 has better accuracy and uses less computation… it covers ~10 other architectures too including inception


Isn’t the current state of the art, densenet?

This paper is ~2 years old

the electronic frontier foundation maintains results on various benchmarks

resnet occurs on leaderboards 11 times
I didn’t see densenet in there but I only did a cursory search

1 Like

In some areas - particularly segmentation - densenet can work great. Not often seen in SoTA work in classification however.


@jeremy I would like to know your views on this paper. Looks like they managed to get SoTA for CIFAR. (A sparse version of densenet)

It’s a good result, but their SoTA claim is wrong


Thats interesting ! One of my colleges also mentioned about the results.

How to know if a paper is legit or not ? Provided that I don’t have the resources to replicate the results. It is already a hard task to go through the recent papers, and feels a bit disappointing when their claims are wrong.

Thanks for mentioning about the shakedrop paper. I wasn’t familiar with these kind of approaches. It seems to work really well for wider networks, especially wide resnet. Also, this paper introduced me to a lot of new concepts like EraseRelu. I would get back to this after going through the previous regularisation techniques (Shake-Shake & Stochastic depth).

Last time I checked, Squeeze-and-Excitation network (SE-ResNet) was the “State of the Art” - or maybe NASNet ( Bottom line: it doesn’t matter at this point because ResNet works perfectly fine for most applications and it’s reasonably fast.


What about in terms of training time and no of parameters? Do you know architectures which have been providing reasonable benchmarks but keeping these two optimal?

SENet and ResNet are very similar, SENet has a bit more params since it has one further skip connection in every block. However, they should be comparable in terms of training time. NASNet is off the charts with the number of params. DenseNet has fewer params than ResNet but is deeper, therefore, trains a bit longer. If there are resource constraints and want something small and fast, there’s MobileNet and SqueezeNet - both have fewer params, train fast, although have (in general) worse accuracy. There’s so many architectures now, it’s crazy. Then again, on average I would say ResNet is the best choice considering accuracy, simplicity, number of params and speed of training.