Lesson 12 (2019) discussion and wiki

sgugger · April 18, 2019, 2:17am

Maxpool 2d is actually used in the XResnet

sgugger · April 18, 2019, 2:18am

It’s not the same things AFAIK, it’s our version of the bag of tricks ResNet.

wittmannf · April 18, 2019, 2:19am

I thought it was average pooling only

sgugger · April 18, 2019, 2:20am

No, there is one in the very first layers ;).

wittmannf · April 18, 2019, 2:21am

OH, that’s right! hahah thanks @sgugger.

rachel · April 18, 2019, 2:21am

Here are the benchmarks Jeremy is sharing: https://github.com/cgnorthcutt/benchmarking-keras-pytorch

wittmannf · April 18, 2019, 2:22am

But I am still not sure when I should use one or the other. Are there any best practices rules? Sorry if this is a dumb question hahah

harikrishnanrajeev · April 18, 2019, 2:23am

when running out of memory does gc.collect() work ?.

alenas · April 18, 2019, 2:23am

in leader board the accuracy from a single run is recorded, what is the variance based on different train-test splits? do we care?

devforfu · April 18, 2019, 2:24am

Depends on the type of memory. I guess if you have CUDA exception, you need use something else or restart the kernel.

PierreO · April 18, 2019, 2:24am

I agree that deeply understanding the model and optimize it like the Bag of tricks paper did is great. Don’t you think automatic search for good architectures still has a place though? For example I’ve seen you recently like a new paper that found SOTA architecture for Object Detection through automatic architecture search.

sgugger · April 18, 2019, 2:24am

There is an official train/test split to use.

sgugger · April 18, 2019, 2:24am

It’s still very immature and hard to reproduce, in our experience.

PierreO · April 18, 2019, 2:25am

Any idea why?

firetix · April 18, 2019, 2:26am

Is it safe to overfit when you are doing transfert learning? It looks like jeremy ovefitted before doing transfert learning.

sgugger · April 18, 2019, 2:27am

It’s still very new, that’s why.

neuradai · April 18, 2019, 2:27am

ResNeXt is different from XResNet. See this figure:

From this Medium post:

alenas · April 18, 2019, 2:29am

I think Jeremy has an argument against cross validation for deep learning I forgot what it was

neuradai · April 18, 2019, 2:30am

Would love to hear this argument against X-val in DL…

devforfu · April 18, 2019, 2:34am

Confusing naming, indeed