Lesson 12 (2019) discussion and wiki

Maxpool 2d is actually used in the XResnet :wink:

2 Likes

It’s not the same things AFAIK, it’s our version of the bag of tricks ResNet.

1 Like

I thought it was average pooling only :slight_smile:

No, there is one in the very first layers ;).

OH, that’s right! hahah thanks @sgugger.

Here are the benchmarks Jeremy is sharing: https://github.com/cgnorthcutt/benchmarking-keras-pytorch

5 Likes

But I am still not sure when I should use one or the other. Are there any best practices rules? Sorry if this is a dumb question hahah

when running out of memory does gc.collect() work ?.

in leader board the accuracy from a single run is recorded, what is the variance based on different train-test splits? do we care?

Depends on the type of memory. I guess if you have CUDA exception, you need use something else or restart the kernel.

I agree that deeply understanding the model and optimize it like the Bag of tricks paper did is great. Don’t you think automatic search for good architectures still has a place though? For example I’ve seen you recently like a new paper that found SOTA architecture for Object Detection through automatic architecture search.

1 Like

There is an official train/test split to use.

It’s still very immature and hard to reproduce, in our experience.

Any idea why?

Is it safe to overfit when you are doing transfert learning? It looks like jeremy ovefitted before doing transfert learning.

It’s still very new, that’s why.

2 Likes

ResNeXt is different from XResNet. See this figure:


From this Medium post:

8 Likes

I think Jeremy has an argument against cross validation for deep learning I forgot what it was

3 Likes

Would love to hear this argument against X-val in DL…

3 Likes

Confusing naming, indeed :smile: