Lesson 8 (2019) discussion & wiki

What is a “ranked” tensor?

Is that the number of dimensions of a tensor?

Is the lowest ranked tensor a “rank 2” tensor?

For example: https://youtu.be/4u8FxNEDUeg?t=3631 here the first two tensors are described as “rank 2” and the last one is a “rank 3”.

Also does asking questions here instead of googling it myself help contribute to the forums?

May I suggest a global glossary of terms page for the forums or docs.fast.ai?

The rank signifies how many dimensions a tensor is. A rank 2 tensor, for example, is a traditional matrix (row by columns). A rank 1 tensor would be a vector, a rank 0 a scaler, etc. Does that help?

I have git blame git history and gitlens I’m not sure which one is doing what :slight_smile:

1 Like

This is NOT an official guide but I posted on medium how I installed fastai v1 on my Windows 10.

Thanks for posting, pierreguillou. this is very useful. However, for the course2v3 fastai notebooks to run, we need to update using pytorchnightly. Could you please explain how your instructions should be modified to do that?

I guess Jeremy gave the answer (https://forums.fast.ai/t/lesson-10-discussion-wiki-2019/42781).
I have to try it on my Windows 10. Anyone already tried?

1 Like

I’ve started doing TDD from this book :smile:

Also, when I started working on fastai lectures, I have realized that Jupyter notebooks are much more convenient and powerful tools that I’ve thought originally. I never used debugger from the notebook before but it makes a very big difference. (Especially convenient with new breakpoint() builtin).

And, at the end of the day, a notebook is just a JSON file so one can export it (as shown in the lectures), or execute automatically. The final proof for me was this notebook I’ve implemented during Part 1. A training loop, model, helper snippets, etc. – everything was built within this single notebook without any IDE or code editor. Therefore, now I start to understand how the fastai team managed to build the library in the Jupyter.

3 Likes

Yes, works for me on Win10. I have seen some strange slowing down though - maybe needed to restart my kernel. The cool thing is that most of this course also runs just fine on my old laptop - even the GPU CUDA stuff - with a NVIDIA Quadro M2000M. Obviously we aren’t doing anything heavy with just MNIST, but I’m sure that GPU wasn’t supported for ‘full’ fastai before.

1 Like

Except, it doesn’t have the power of your editor and all the handy features that the editor provides. So I often end up moving the code into my editor to do something that it’d much faster to do there and then back to jupyter - yickes. E.g. jupyter’s search-replace feature is a joke.

I haven’t tried it, but I think I remember seeing that both emacs and vim have jupyter-like extensions (that allow you to do a very similar thing from inside your editor).

2 Likes

Agree, makes sense. I also tend to “dump” fragments of the notebooks into plain Python modules and classes when feeling that some bit of functionality is ready enough to make it “stable”. The autoreload extension is very handy here. Like, you have a notebook with prototyping code and a bunch of .py files with supplementary utilities and classes.

By the way, the PyCharm IDE “natively” supports Jupyter notebooks but the quality is not that good. Still very buggy and slow. At least, it was so last time I’ve tried to use this feature.

1 Like

Except it’s quite broken :frowning: https://github.com/ipython/ipython/issues/11588

1 Like

Oh, that’s interesting! Thank you for sharing. I believe that I’ve never encountered the case with pickle you’ve shown in the issue thread, so I was completely unaware of this problem. And yes, quite often I just recreate the objects when making any change to the class code. Actually, I used this autoload thing as a kind of black-box without trying to understand its internals. Probably it is time to look a bit closer into Jupyter extensions.

1 Like

Just out of curiosity, working through the 01_matmul notebook, i barely get a speedup between einsum notation and pytorch’s matmul. einsum runs in 71us and matmul runs in 45us (vs 18us for Jeremy).

I’m running an AMD processor with 16cores/32 threads on ubuntu 18.04

The other interesting thing i noticed is that my pure python implementation was the opposite situation meaning, my pure python matrix multiplication was almost twice as fast as the cpu jeremy was using.

I’m wondering if anyone has ideas on whats going on to cause this behavior. Without knowing Jeremy’s cpu it’s tough to tell for sure but maybe it’s a result of the high core count on my cpu creating overhead in the final pytorch implementation? Or maybe the single thread speed of my multicore cpu is that much slower?

I’d guess maybe you don’t have a BLAS that’s fast for AMD processors.

starting to understand the need for swift and julia better now . . .

Is there mathematical prove for this dividing with previous layer column size to initialize the layer parameters? I would like to look at it in case I understand it better because currently it is just magic for me.

1 Like

I was looking forward for seeing all the followups of this post below, but it seems the topic has been dropped since then. Let me resurrect it.

Maybe because the confidence is not taken into account by the accuracy/error rate metrics. Let me explain myself. Suppose that the softmax outputs something like [0.9, 0.08, 0.02] for a 3-class classification problem, and that prediction is correct, by checking the ground truths.
Then, the validation loss worsens, since we are overfitting (by any possible definition, if the val loss diverges, you are definitely overfitting), and you are predicting [0.8, 0.1, 0.1] for the same example.
While your loss has got worse, you accuracy has not.

Could you tell me if I’m right, and if there ar other considerations worth of making?

Here I’m just trying to guess. You could have observed that even for clearly overfitted models, your model, while not being particularly accurate (bas parts of the loss surface) still retains a substantial generalization power (wide local optima) that is, that not particularly stellar accuracy does not worsen too much over unseen test data.

1 Like

Precisely right! Well done :slight_smile:

I would define over-fitting as “training too much” - so under that definition, we’re not over-fitting. I’m not sure if there’s a definitive definition however.

2 Likes

Thanks for your answer, Jeremy!

May I ask you for a brief comment about the second part (the one where you were talking about the loss surface topology and their stationary points).

Thanks! :wink:

Yes I think that’s about right. I’m not sure my comment about this or this overall part of the discussion was particularly helpful though.

1 Like