Lesson 9 Discussion & Wiki (2019)

Would some sort of recursion help

That works. It does almost feel like:
“Let’s stop.”
“Yes, I am sure.”
“I really meant it!”
But it works :slight_smile:

The challenge is that callbacks are called at different “depths” of the code - some are several for loops down, some are not so much. I will keep my eye out to see if there is something I can use to refactor.

I am not sure. What do you have in mind?

But if the idea was to make the distribution bounded , then truncated normal could have been used. PyTorch using uniform , uniform distribution has the premise that outcomes are equally likely. But here can we expect the outcomes to be equally likely ?

even if its difficult to understand now, i would like the lectures to be heavily loaded , so that we get the maximum out of Jeremy time during these classes.

1 Like

Hello,

Anyone nows why are we exporting and overriding nb_02.py file in the 02a_why_sqrt5.ipynb? We have the same line in the 02_fully_connected.ipynb file too.

Thank you in advance.

Thank you for letting us know, @Mirodil.
It’s a copy-n-paste bug, fixed in master now.

Please re-run 02_fully_connected.ipynb to fix nb_02.py.

1 Like

I was just a very high level thought at the end of my day, sorry I can’t expand on that at present other commitments this week

@zachcaceres
I am available pretty much any time. Please let me know when you would have time for a quick chat

I am wandering how unsupervised pre-training of DNN models is done. Could someone suggest good a good source of information about that?

Just a quick thought:

Since Learner is just a container, why not combine it with that new Runner class you are planning to create?

Check the callback discussion thread and notebook 9b: it will be combined :slight_smile:

1 Like

Link to a writeup on weight initialization, drawing from 02b-initializing notebook. https://madaan.github.io/init/

Please let me know if you have any suggestions/see any errors. Thanks.

@stas Should we add this to the wiki?

2 Likes

Yes, of course, that’s why it’s a wiki. And thank you for creating it, @amanmadaan

1 Like

Here is the schedule. It is on April 3rd https://www.usfca.edu/data-institute/certificates/deep-learning-part-two

1 Like

I was looking at the combine_scheds function and I found something odd. In the line

idx = (pos >= pcts).nonzero().max()

I realised that if pos = 1 (in the context of the example given in the NB where we do 30% of schedule 1 and 70% of schedule 2), then I would expect idx to be equal to 2, since pos = 1 >= [0,0.3,1] . However, upon further inspection, I realise that what I thought wasn’t actually the case and pos turns out to be less than pcts[-1], as illustrated below.

Is this another one of those good bugs or was it intentional (knowing Jeremy it probably is right)? Is this one of those caveats in data types that caused this comparison to give a not so intuitive result?

If you take a closer look at the notebooks, this was what was done:
opt.param_groups[-1]['lr']
You have to go into a param_group, thats where you’ll find the ‘lr’ attribute.

I believe this is where _order comes into play, the things that need to be called before termination of the fit process (and whatnot) should be given higher priority.

1 Like

Yup that’s why I said “For now maybe something like…” in my earlier response - because I’ll be showing some alternative approaches tomorrow night! :slight_smile:

3 Likes

We use

def call(self, cb_name):
for cb in sorted(self.cbs, key=lambda x: x._order):

As i understand, in each call executing sorting.
IMHO, better change it to:

for cb in self.cbs:

and sort cbs list it in init like this:
instead of

self.stop,self.cbs = False,[TrainEvalCallback()]+cbs

do:

self.stop,self.cbs = False,sorted([TrainEvalCallback()]+cbs, key=lambda x: x._order)

1 Like

I’ve heard of, but never experimented with, the Swish activation (paper, good blog writeup), which is a sigmoid, lower-bounded non-linear function:

It allegedly slightly outperforms ReLU, particularly for deep networks, without being sensitive to architecture changes. It would be interesting to experiment with how that compares.

Lucky we all know how, now :slight_smile:

5 Likes