Live stream of Leslie Smith interview tonight

jeremy · November 15, 2018, 12:58am

Good news everyone - my interview tonight with research Leslie Smith will be streamed live at 6.30pm pacific time, and also recorded.

Live stream here (will be streaming closer to time)

You can use the Youtube Live comments to ask questions during the discussion.

jeremy · November 15, 2018, 2:10am

Sorry new URL just posted: https://youtu.be/fvMfaVLYIzw

jeremy · November 15, 2018, 2:40am

renato · November 16, 2018, 1:29am

Hi Jeremy, can you explain how you used pretrained networks on 3d lungs scans? Thanks

tank13 · November 20, 2018, 1:38am

I did a quick write-up of my notes during the interview and posted it on Medium: https://medium.com/@tank671/notes-from-the-conversation-between-jeremy-howard-and-leslie-smith-7ad7ce732c4b

Comments welcome! It’s the first piece of writing I’ve put out on the internet, so it was posted amidst a lot of blogging anxiety.

cedric · November 23, 2018, 4:38am

Kudos that you get over it

In case anyone missed it, James Dellinger wrote about the highlights of this interview and see what Leslie has to say:

If you are on Twitter, I recommend you tag Leslie so he is aware of your blog post.

Finally, I have a chance to revisit this interview and the following are my short notes:

I watched the interview through the live stream and got a chance to ask questions and answered!

Hiromi: Thank you so much for broadcasting this. Opportunities like this are hard to come by for some of us.

Abhishek Sharma: If there is one researcher everyone knows in FastAI community, it’s Leslie Smith.

100

Comments and questions that I asked:

Regarding Leslie’s few shots learning research direction:
Some of us did a literature search around this “incremental learning” problem. The goal is to be able to learn as you go - online learning of new classes. In Leslie’s idea, I think we have yet to see anything where the same classes are used in each epoch but with stage-wise increase in batch sizes. I believe Jeremy’s idea is related to curriculum learning. We couldn’t find work related to class-incremental learning (adding new classes as we run more batches).
The “Accurate, Large Minibatch SGD: Training Imagenet in 1 hour” paper from Priya Goyal, et al. (FAIR) is full with little tricks. What’s your favorites? What do you think something that we should know but many are still not aware of, but really makes training easier?

Papers added to my arXiv Sanity reading list:

Minnorm training: an algorithm for training over-parameterized deep neural networks by Yamini Bansal et al.

That’s all for now. Happy Thanksgiving!

jeremy · November 23, 2018, 4:57am

Congrats on getting it out there! I think it’s a great summary, and I also appreciated the personal notes.

MicPie · December 9, 2018, 12:39pm

Thank you for the very interesting interview!

I compiled a short list of the mentioned tricks:

Change data augmenation and hyperparameters in a “u-shaped” way: Low, then high, then low again. The middle “high” should help to generalize better. The last “low” should help to pinpoint the sweet spot. Or high, low, high, depending on the specific parameter. (This is similar to the 1-cycle policy.)
“Batchnorm zero” trick for resnet blocks from Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour. Take the last batchnorm layer in the conv/residual path of the resent block and initialize the learnable multiplication parameter with zero. Then all resnet blocks represent at the beginning of training an identity function. This should improve the model (and is also included in the PyTorch ResNet models).
All the stuff that lets you use large batch sizes lets you also use large learning rates!
This is why you should care about publications for large scale DL with huge batch sizes: Optimizing Neural Networks with Kronecker-factored Approximate Curvature (K-FAC) (I hope this is the right paper?) and Large Batch Training of Convolutional Networks (LARS).
Resnet block scaling: Scale down the conv/residual path by a specific factor (e.g. divide by 10).
Minnorm training - an algorithm for training over-parameterized deep neural networks.pdf (Recommended by Leslie Smith around 1h 28m.) (I still have to read this paper.)

If I missed another good trick or misinterpreted something please reply!

Kind regards
Michael