Hi, I’m Constantin. I recently studied the fastai book, which I enjoyed very much and made me interested in going deeper in DL and CV.
I’ve been searching for a (chronologically-ordered) short-list of cornerstone papers on the topic of “semantic segmentation” (and similar for “image classification”, etc). For example, I’d like to see the most important 10, then 30, then 100 in chronological order, that shaped the field in some meaningful way. A measure for “most important” as number of citations in other papers (or similar) should be fine for this purpose.
Even more specifically, I’d like to read on the evolution of segmentation architectures (UNet, UNet+, DeepLabV1/2/3/3+) and the associated encoders (VGG, then ResNet, then ResNext, then so on and so forth) in their context at the time of publishing.
I wasn’t able to find anything like this. arxiv and google-scholar offer some search capability for it seems quite limited, and there’s still significant filtering and classification remaining to be done manually. My hope is that I’ve been missing a tool/feature somewhere that’ll just answer what I’m looking for.
I appreciate any suggestions on the same, thank you.
Since no one responded to you in a month I’ll give you a ChatGPT response to your exact post:
Hello Constantin,
It’s great to hear that you enjoyed the fastai book and are interested in delving deeper into deep learning and computer vision, specifically in the area of semantic segmentation. As of my last update in September 2021, I can provide you with a shortlist of cornerstone papers for semantic segmentation, image classification, and some notable architectures that you can explore chronologically. Please keep in mind that the number of citations may have changed since then, so it’s best to verify the current citation count when researching these papers.
Semantic Segmentation:
Fully Convolutional Networks for Semantic Segmentation (FCN) - Jonathan Long, Evan Shelhamer, Trevor Darrell (CVPR 2015) - This paper introduced the concept of using fully convolutional networks for dense pixel-wise predictions, making significant strides in semantic segmentation.
U-Net: Convolutional Networks for Biomedical Image Segmentation - Olaf Ronneberger, Philipp Fischer, Thomas Brox (MICCAI 2015) - The U-Net architecture became popular for its success in biomedical image segmentation tasks.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (DeepLabV1) - Liang-Chieh Chen et al. (ICLR 2015) - This work introduced dilated/atrous convolutions and Conditional Random Fields (CRFs) to improve segmentation accuracy.
DeepLabV2: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs - Liang-Chieh Chen et al. (TPAMI 2018) - An extension of DeepLabV1, which further refined the architecture with additional features.
DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation - Liang-Chieh Chen et al. (arXiv 2017) - An improved version of DeepLabV2 with better performance.
DeepLabV3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation - Liang-Chieh Chen et al. (ECCV 2018) - This work enhanced DeepLabV3 by adopting an encoder-decoder architecture with atrous separable convolutions.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation - Zongwei Zhou et al. (DLMIA 2018) - UNet++ is an extension of the original UNet that achieves better performance in medical image segmentation.
Image Classification:
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) - Alex Krizhevsky et al. (NIPS 2012) - This paper is often considered the beginning of the deep learning revolution for image classification, winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012.
Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG) - Karen Simonyan, Andrew Zisserman (ICLR 2015) - VGG introduced the idea of using very deep networks and is known for its simple and uniform architecture.
Deep Residual Learning for Image Recognition (ResNet) - Kaiming He et al. (CVPR 2016) - ResNet introduced residual blocks, addressing the vanishing gradient problem in very deep networks.
Aggregated Residual Transformations for Deep Neural Networks (ResNeXt) - Saining Xie et al. (CVPR 2017) - ResNeXt introduced the concept of cardinality, which is an extension of depth and width for improved performance.
Note: For image classification, the field has evolved rapidly, and numerous other architectures have been proposed since 2017. It’s worth exploring the EfficientNet, DenseNet, and Transformer-based models as well.
To find these papers and their citation counts, you can use Google Scholar or academic databases like IEEE Xplore, ACM Digital Library, or arXiv. You can also use tools like Semantic Scholar, which provide a more research-focused search capability for academic papers.
Remember that this list is not exhaustive, and there might be other relevant papers and architectures to explore. As you delve deeper, you may find more recent developments and advances that are not covered in this list. Enjoy your journey into the fascinating world of deep learning and computer vision!