Part 2 lesson 11 wiki


(Arvind Nagaraj) #135

This is fast approximate nearest neighbors…the guy who built the nearest neighbors for spotify, also invented a method called annoy benchmarked all these methods - nmslib is the most incredibly fast way to find nearest neighbors in high dim vector space.

The other way to do this is k-means clustering (or even better, vector quantization) - but for this example, fast approx KNNs work really fast.


(unknown) #136

Is this the source paper? : http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf


(Divyansh Jha) #137

this devise thing was magical!


(Phani Srikanth) #138

Personally, I enjoyed the last 20minutes like anything. The ease and the elegance with which the concept was explained and the simplicity of the implementation with what we already know from fastai library was quite phenomenal.


(Even Oldridge) #139

On the approximate nearest neighbors front, a friend has a great blog post outlining the tradeoffs of precision vs performance that’s definitely worth checking out.

http://www.benfrederickson.com/approximate-nearest-neighbours-for-recommender-systems/

One thing that’s worth noting is that if you’re building a production system FAISS on GPU is an order of magnitude faster than NMSLib. NMSLib runs at 200,000 QPS (Queries per second) in batch mode on the CPU (Core i7-7820x), and the GPU version of Faiss runs at 1,500,000 QPS on a 1080ti.

Faiss used to be a pain to setup but I believe they recently added pip support. I’m not sure if that includes the GPU aspect, but it’s worth considering if you need to do this at scale.


(Jeremy Howard) #140

The validation set doesn’t need to backprop, so it uses have the memory, so we can use a bigger batch size to run it faster.


(Jeremy Howard) #141

Nope - seems reasonable :slight_smile:


(Jeremy Howard) #142

Without a 2nd linear layer it’s just a linear model, not a neural net!


(Jeremy Howard) #143

Yes I expect so - I don’t know if it would help; it’s basically the same idea as cyclegan. I’m not really fully up to date on the translation literature, so dunno if this has been done before…


(Vineet) #144

Does anyone know the original attention paper that jeremy discussed…i think i missed that…he was mentioning 2 papers useful to read ??


(Jeremy Howard) #145

There is an “attentive language model” paper that claims good results. I haven’t tried it.


(Jeremy Howard) #146

https://arxiv.org/abs/1409.0473


(Jeremy Howard) #147

An hour or so


(Jeremy Howard) #148

Yup. Rather foolishly I think they’re both required params, which isn’t ideal!


(Jeremy Howard) #149

Can I ask a favor - the top wiki post hasn’t been edited yet, but there’s lots of good links and resources in the replies here. If anyone has a moment would you be so kind as to add them to the top post?

I need to go to bed! :slight_smile:


(Jeremy Howard) #150

Oops turns out I forgot to make the top post into an actual wiki! Fixed now…


(Arvind Nagaraj) #151

Thanks for a great class, Jeremy!
Fish-in-nets : I’ve been waiting forever to hear you speak about DeVise again!


(Mike Kunz ) #152

Why not cluster the data set. That’s the point.


(chunduri) #153

I think the limit set for the loop is longest sentence length possible, so it will never truncate before EOS.


#154

Jeremy mentioned in the video we can now download has the link from Kaggle. Does anyone have the link? I can find the object detection challenge but not the usual classification challenge.