Lesson 9 Discussion & Wiki (2019)

We did briefly mention it in lesson 1, when we did this:

We’ll discuss it more in the next lesson.

5 Likes

I did not understand what the order of callbacks actually does and how it works. What do we actually order, the classes that inherit from the callback class or the methods?
And why we did not specify an order in “AvgStatsCallback”, I understand that it automatically gets a 0 (from the parent). What am I misunderstanding?

I briefly discussed the problems with stuff like SELU and Fixup in lesson 2 - they are extremely sensitive to tiny changes in architecture. I’ll be discussing this a bit more in the next lesson.

3 Likes

Stas, I have been following the guide, but the guide does not seem to apply only to fastai, course-v3, and fastprogress repos. I am trying to create a pull request aganst fasta_docs. Does it matter?

Since this week is talking about building the training loop from scratch and last week was talking about speed, I wanted to mention an observation I had about the DataLoader and some ways to speed it up.

I was motivated by a very large data set some time back to dig deeply into the detail of the Pytorch Dataloader and Dataset. For very big data sets (millions or samples) small timing differences can really speed up your training loop. I believe the difference comes from how you index into the array. Since both RandomSampler and SequentialSampler return a list of indices you end up with something that looks like this when you generate a batch:

batch = self.collate_fn([self.dataset[i] for i in indices])

But this method of indexing is very slow compared to using the : notation from numpy.
image

If you couple this with adding a single Randomization at the start of each epoch, you can have a huge speed improvement in just iterating through your dataset.

8 Likes

It’s exactly the same as all other fastai repos. I just didn’t put it in the guide since it wasn’t meant to be used by anybody but maintainers, until a week ago. I will tweak it now to include that repo as well. But the easiest is to use the helper tool - it does all the work for you.

update: docs updated, if you encounter any specific obstacles or something is unclear or confusing, hard to follow, please ask at Documentation improvements. Thank you.

1 Like

@champs.jaideep For the first two, CSVLogger can be used which writes the metrics in a csv file to learner.path…Nd using the excellent fastai callbacks read the csv file and change behaviour of model/lrs accordingly(based on those metrics) at required step(batch/epoch begin/end etc)

Thanks for this advice. Right now, I am feeling overwhelmed by this course but then I remind myself that these courses are ‘ML for adults’. We are not constrained by the course imposing deadlines, tests or awards for completion. Everybody has other commitments but here we have the material, further reading and an active forum to resolve doubts. How much we progress each week is up to us. The important thing is to do just that; progress.

8 Likes

I have created a pull request but automatic merge failed on one of the notebooks. I am not sure what I can do about that

Reference, please?

For the Lesson 9 assignment I summarized the All you need is a good init paper. I have written this Medium post summarizing the paper going through it section by section.

Must say that the paper is pretty clear and concise. The algorithm they provide is really straightforward too.

As always, if you see something that is not clear or wrong, just reach out.

1 Like

TDD? Please explain acronyms. Not everyone knows them.

that notebook has changed in master since you made the fork, you need to rebase it, see:
https://docs.fast.ai/dev/git.html#how-to-keep-your-feature-branch-up-to-date

TDD = Test Driven Development

1 Like

Also please don’t submit PRs w/ video links until MOOC part 2 is public. Thank you.

I understand why you would want to normalize your input, for example making it to have mean of zero and std of 1. My question is: why specifically mean 0 and std 1? What would be the difference if I normalize to mean=1 std=1? or mean=1 std=2? Intuitively I can see an argument for std=1: if you think it like a “signal”, if std is less than 1, the signal would reduce until it disappears, and with std > 1 it would magnify and get out of control. But what about the mean? What difference would it make if it was 1, or 3 or 7? The shape of the distribution would be the same in this case, only centered around a different number.

2 Likes

I am planning to continue annotations as the course progresses. I will be creating pull requests a day or two after each lecture. Since pull process now works, I do not need to maintain separate PR for the notebooks. I only created my private repository because I could not create a pull request. But it would be nice if these requests would be available through the official course repository so people taking the course now could benefit from the annotations.

As I mentioned above please do not submit PRs with links to unlisted videos, because the fastai_docs repo is public. Until MOOC part 2 is released, the links can only go into the special section of the forum “Part 2 (2019)” or the notes - see the other threads with the notes. You can also share the link to your repo here. Thank you for understanding.

Thank you for clarification.
I have another question. I spoke with Jeremy Howard about the annotated notebooks and Jeremy asked if these annotations could be integrated with the video viewer. I suppose the integration might be possible after the course ends and public release is in the works, but it would be nice to discuss it with someone working on the viewer ahead of time

I was reviewing the lecture and came across something I was not sure about.

In 04_callbacks.ipynb, there is a TestCallback class that looks like the following:

class TestCallback(Callback):
    _order=1
    def after_step(self):
        if self.n_iter>=10: return True

Returning True here means “stop”. But Runner’s functions look like:

    def one_batch(self, xb, yb):
        self.xb,self.yb = xb,yb
        if self('begin_batch'): return
        self.pred = self.model(self.xb)
        if self('after_pred'): return
        self.loss = self.loss_func(self.pred, self.yb)
        if self('after_loss') or not self.in_train: return
        self.loss.backward()
        if self('after_backward'): return
        self.opt.step()
        if self('after_step'): return # <<<<<<<<<<<<<< HERE
        self.opt.zero_grad()

    def all_batches(self, dl):
        self.iters = len(dl)
        for xb,yb in dl:
            if self.stop: break
            self.one_batch(xb, yb)
            self('after_batch')
        self.stop=False

So after n_iter reaches 10, TestCallback continue to return True, but the loop in all_batches keeps going and the only thing it is achieving is not setting the gradients back to zero.

I decided to add TestCallback and experiment.

stats = [TestCallback(), AvgStatsCallback([accuracy])]
run = Runner(cbs=stats)

Trial 1

The first thing I changed was to put back self.run.stop=True in TestCallback. This causes the for loop inside of all_batches to break after 10 iterations.

Then I noticed that n_iter gets set to zero at the beginning of fit function. So during the second epoch, the TestCallback sees that n_iter > 10 right away, and all_batches loop terminates (gist).

Trial 2

I thought “maybe self.run.n_iter=0 should happen at the beginning of epoch”. I tried that and now training loop exits after 10 iterations every epoch while running the complete validation loop (gist).

Trial 3

I tried resetting the n_iter and incrementing it during the validation phase, but because of this line in one_batch function:

if self('after_loss') or not self.in_train: return

It never gets to TestCallback's after_step during the validation phase(gist).

Question

  • Is it okay to reset n_iter at the beginning of epoch?
  • Do we want TestCallback to terminate the validation loop as it does for training?
2 Likes