Lesson 9 Discussion & Wiki (2019)

sparalic · March 26, 2019, 2:56am

I would also love to see some of this stuff done on sequence models…

tanyaroosta · March 26, 2019, 2:56am

If i understand the initialization schemes correctly, it is all so that we don’t need regularization when training the network. Does this mean we don’t need dropout if we do proper initalization either?

sgugger · March 26, 2019, 2:57am

The goal of part 2 is to teach you how to build your own thing that fits the task you are tackling. We have the integrated tools for mainstream tasks and part one was about those. We’ll never get all the tools for specific tasks so we have chosen to focus on making the library highly flexible, then explain to all of you how you can customize it to your needs.

Gabriel_Syme · March 26, 2019, 2:57am

I am not sure if you don’t need it due to initialization alone but I do know that dropout has fallen out of favor and BN is used instead.

maya · March 26, 2019, 2:57am

Similar question. This looks like general approach for most models. Is there any reason why this is only applied on ConvNets?

Gabriel_Syme · March 26, 2019, 2:58am

Got it thanks, makes sense!

Not an expert but have been working with generative models a lot lately, will give it a shot to do fast.ai versions.

KarlH · March 26, 2019, 2:58am

Initialization isn’t intended to remove the need for regularization. There are papers that show you can train effectively without regularization using good initialization, but removing regularization isn’t the point.

sgugger · March 26, 2019, 2:58am

No it’s very different. if you don’t initialize properly, you have a chance that your model just won’t train. No matter what amount of regularization you have.

rachel · March 26, 2019, 2:59am

We will cover sequence models later in the course

wgpubs · March 26, 2019, 2:59am

I think what we’re looking at now is pretty much applicable to any kind of model (at least everything we’re looking at in the 03 notebook).

tanyaroosta · March 26, 2019, 3:00am

I recall Jermey mentioning it when he started discussing good initialization.

gmohandass · March 26, 2019, 3:01am

Will we get to see BERT in action?

JoshVarty · March 26, 2019, 3:01am

RE: Zero_grad, we could just use a zero_grad flag in step(). I suspect they went with the wrong default behavior but for backwards compatibility we can’t really get away from it.

sparalic · March 26, 2019, 3:02am

Thanks @rachel!

KevinB · March 26, 2019, 3:02am

Putting everything in arguments is kind of what fastai does, right?

champs.jaideep · March 26, 2019, 3:02am

ok… is it some thing to do with autograd ?

wgpubs · March 26, 2019, 3:02am

If you’re interested, here’s a sample notebook I put together demonstrating how one might build and train a seq2seq model using the DataBlock API and fastai.

You could replicate the training loop code Jeremy is showing off yourself to train the model.

devforfu · March 26, 2019, 3:03am

Probably the PyTorch team decided to follow the “explicit better than implicit” motto

rachel · March 26, 2019, 3:03am

I’ve added a link to Sylvain’s talk in the wiki

Pomo · March 26, 2019, 3:04am

So if a layer or function appears only in the Module’s forward or__call__ methods, then Pytorch will not automagically know about it or its parameters (if any)?

Edit: Right, per experiment. PyTorch registers the parameters of Modules only when they are saved as instance variables in the Module. If you don’t name and save the sub-Module, its parameters will not be registered with the containing Module. Therefore no gradient updates, and it won’t appear when the model is printed.

So - please correct me if wrong - an operation like relu (no parameters) could be either a layer nn.ReLU saved in__init__ or a function F.relu() used in forward().

P.S. This also automagically works:
learn.model[-1][0] = MyModule()

where learn.model[-1] is a Sequential container. learn.model correctly appends the new parameters from MyModule.