Lesson 8 (2019) discussion & wiki

I think I was using similar setup connecting to GCP instances. Just forgot about it lol :slight_smile: thanks I was hoping there is a official windows support lol

2 Likes

You’re right, but MNIST images are grayscale, so there’s only a single feature/channel. For ImageNet dataset we would normalize each channel (RGB) separately.

3 Likes

I guess it is called “port forwarding”. With this you can run your local jupyter notebook on port 8888 and forward the port 8888 from your server to your port 8889 so you can have the local and the server jupyter notebook running at the same time.

1 Like

Okay, yeah makes sense.

If this would have been a tabular data, we would have then calculated mean and stddev for each of the 784 dimensions separately, right?

can anyone please explain me following code in 02_fullyconnected notebook?

test_near_zero(w1.std()-1/math.sqrt(m))

I was thinking that we wanted w1.std to be as close to “1” but not understanding why we wanted it to near (1/math.sqrt(m)).

1 Like

just so i understand this. to detect overfitting we monitor accuracy or validation error?

in the response above, Jeremy states that we want to track accuracy, then in the lesson 8 video, Jeremy says it’s the validation error we want to track:

so overfit means what? it means that your training loss is lower than your validation loss. no. no it doesn’t mean that. remember it doesn’t mean that. a well fit model will almost always have training loss lower than the validation loss. remember that overfit means you have actually personally seen your validation error getting worse okay. until you see that happening you’re not overfitting.

or maybe this is just definition thing, we have:
validation loss
training loss
validation error
accuracy
error_rate (introduced in pets notebook from part 1)

so i understand that validation loss is not the same as validation error
and that validation error (or error rate) = 1 - accuracy

and that to detect overfitting we track accuracy or validation error (error rate).

correct me if i am wrong here. thanks :slight_smile:

Going through the recording of the lecture 8 I annotated lesson notebooks with YouTube video links. Added annotations link position in the notebook to the corresponding location of the video where Jeremy discusses the few next cells in the notebook. I found it useful for myself, and I think it might be useful for others. I would like to share it, but I am not sure how to do that. What would be the right way to contribute the annotated .ipynb files?

1 Like

That’s great. Would you mind sharing these via your GitHub repo?

I wanted to force myself to not “cheat” by looking at Jeremy’s code while re-creating what we did in lesson 8, so I made a notebook with more or less only the instructions. Not sure if it will be helpful to anyone else, but I posted it here

7 Likes

Yes, with a table consisting of 784 columns of continuous variables. You get it by default using fastai.tabular.transform.Normalize.

inject it here maybe: https://forums.fast.ai/t/collaborative-lesson-notes/40387/30

@jeremy, I think there is another buglet to fix in the video.

In broadcasting matmult (video 1:04:25) you suggest that these 2 are the same:

    for i in range(ar):
        c[i] = (a[i].unsqueeze(-1)*b).sum(dim=0)
        c[i] = (a[i,None]*b).sum(dim=0)

but the second one gives an error.

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

The only way I can make it work is via a temp assignment:

    for i in range(ar):
        t = a[i]
        c[i] = (t[:,None]*b).sum(dim=0)

All your unsqueeze using None examples were using : but as soon as you replace : with an actual index like 0, those things no longer work. Here is an example:

m1 = torch.arange(6.0).view(2,3)
m1[0]
m1[0].unsqueeze(-1)
m1[0, None]
tensor([0., 1., 2.])
tensor([[0.],
        [1.],
        [2.]])
tensor([[0., 1., 2.]])

Thanks.

3 Likes

Someone didn’t do their homework :wink: Please see this thread https://forums.fast.ai/t/lesson-8-readings-xavier-and-kaiming-initialization/41715 by @PierreO and his great post https://pouannes.github.io/blog/initialization/ for the details.

3 Likes

Here is my take on the Kaiming He Initialization paper. I have written this Medium post summarizing section 2.2, and understanding it.

Glad I did it since, I definitely needed to refresh some concepts, but it is pretty straightforward, and combining it with the videos you should have a clear idea on what this method does vs. the Xavier/Godot Initialization.

If you see something that is not clear or wrong, just reach out.

PS: Shoutout to @PierreO, saw your post this morning, really nice job man.

5 Likes

Thanks! I’ve created an Errata section in the top post and added this there. Please add any other bugs you notice (except those already mentioned in the video).

2 Likes

Just added a repo link on the notes thread.

2 Likes

Thank you @mkardas for sharing wonderful blog by @PierreO . I was trying but not making much progress on read papers. :slight_smile:

I’m glad I searched before posting the question :slight_smile:
I tried to see if there are ways to achieve this using None but didn’t find one until I saw Jeremy’s post on top (and no matter how long I play with it, it still feels strange).

This also worked:

a[i,...,None]
tensor([[4],
        [5],
        [6]])
3 Likes
a[i,...,None]

Indeed. That would suggest that you may have more than 2 dimensions, so in this case I feel it’d be confusing, since there are only 2.

After experimenting with this, my feeling is that using an explicit unsqueeze is the most intuitive way.

Often you want to write code that can handle varying rank - in which case ... can be used to ensure you don’t have to change anything when rank changes.

2 Likes