Lesson 9 Discussion & Wiki (2019)

I would also love to see some of this stuff done on sequence models…

If i understand the initialization schemes correctly, it is all so that we don’t need regularization when training the network. Does this mean we don’t need dropout if we do proper initalization either?

1 Like

The goal of part 2 is to teach you how to build your own thing that fits the task you are tackling. We have the integrated tools for mainstream tasks and part one was about those. We’ll never get all the tools for specific tasks so we have chosen to focus on making the library highly flexible, then explain to all of you how you can customize it to your needs.

3 Likes

I am not sure if you don’t need it due to initialization alone but I do know that dropout has fallen out of favor and BN is used instead.

Similar question. This looks like general approach for most models. Is there any reason why this is only applied on ConvNets?

Got it thanks, makes sense!

Not an expert but have been working with generative models a lot lately, will give it a shot to do fast.ai versions.

1 Like

Initialization isn’t intended to remove the need for regularization. There are papers that show you can train effectively without regularization using good initialization, but removing regularization isn’t the point.

No it’s very different. if you don’t initialize properly, you have a chance that your model just won’t train. No matter what amount of regularization you have.

1 Like

We will cover sequence models later in the course

1 Like

I think what we’re looking at now is pretty much applicable to any kind of model (at least everything we’re looking at in the 03 notebook).

I recall Jermey mentioning it when he started discussing good initialization.

Will we get to see BERT in action?

2 Likes

RE: Zero_grad, we could just use a zero_grad flag in step(). I suspect they went with the wrong default behavior but for backwards compatibility we can’t really get away from it.

Thanks @rachel!

1 Like

Putting everything in arguments is kind of what fastai does, right?

ok… is it some thing to do with autograd ?

If you’re interested, here’s a sample notebook I put together demonstrating how one might build and train a seq2seq model using the DataBlock API and fastai.

You could replicate the training loop code Jeremy is showing off yourself to train the model.

5 Likes

Probably the PyTorch team decided to follow the “explicit better than implicit” motto :smile:

I’ve added a link to Sylvain’s talk in the wiki

5 Likes

So if a layer or function appears only in the Module’s forward or__call__ methods, then Pytorch will not automagically know about it or its parameters (if any)?

Edit: Right, per experiment. PyTorch registers the parameters of Modules only when they are saved as instance variables in the Module. If you don’t name and save the sub-Module, its parameters will not be registered with the containing Module. Therefore no gradient updates, and it won’t appear when the model is printed.

So - please correct me if wrong - an operation like relu (no parameters) could be either a layer nn.ReLU saved in__init__ or a function F.relu() used in forward().

P.S. This also automagically works:
learn.model[-1][0] = MyModule()

where learn.model[-1] is a Sequential container. learn.model correctly appends the new parameters from MyModule.