Have we learnt everything we need, to get to top 1% in Kaggle for computer vision?

In Deep Learning, Part-1 we have learnt CNN’s which have proved to be extremely reliable and flexible for most of the computer vision problems. We have also learnt quite a few techniques and tricks to train, build better and faster models (different learning rate optimization techniques, data augmentation, ensemble, bolz, pseudo-labels, etc). Are these mostly sufficient to get to top 1% is most of the Kaggle competitions on computer vision? If not, what else is remaining? It would be great if we could list some of the other neural network architectures and techniques that will not be covered in this course along with links to the corresponding resources. Once we have a good list, I will go ahead and create a wiki page to list some of these techniques. Hopefully we will keep the wiki page updated with emerging techniques :slight_smile:

1 Like

Did you see this one?

Linked from the Lesson 4 wiki page.

That link from @chris is the best resource for getting great computer vision results that I’m aware of. In short, the answer is ‘yes’, you can get top 1% on at least some CV competitions. The main missing thing that we’ll be covering in this course is the resnet architecture. We haven’t covered it yet since it’s not clear whether it is very effective for transfer learning or not.

We’ll be covering in next year’s course the things necessary to go beyond just classification to stuff like:

  • Captioning (Combined CNN/RNN encoder/decoder models)
  • Localization (e.g. U-net architecture)
  • Clustering (e.g. siamese and triplet architectures)

Next year we’ll also look at attentional models - plus of course all the stuff that is going to be invented between now and then!