New pytorch tutorial draft - feedback welcome

wgpubs · September 14, 2018, 9:29pm

I’m assuming this is for the upcoming v.1 of pytorch?

If so, might be nice to include that in the markdown and also print out the version a la …

import torch
print(torch.__version__)

I can already see myself wondering why things aren’t working only to find out later I’m using an unsupported version of pytorch.

Also what is this magic and I love it …
x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))

jeremy · September 14, 2018, 10:04pm

Yeah I’ll talk to the pytorch folks about the best way to handle versioning.

I was rather fond of that I must admit…

stephenjohnson · September 14, 2018, 10:45pm

In the notebook, the randomly initialized model’s loss is later compared to the trained model’s loss to show that the loss has decreased, however, it’s not a true apples to apples comparison since during training the xb and yb variables have been modified and now the loss is being computed on a much smaller batch of 16 instead of the original 64. The loss does decrease when compared to the original batch but not as much as it appears unless I’m doing something wrong.

Perhaps a method needs to be added to always get the same original batch when comparing the trained loss.

jeremy · September 14, 2018, 10:52pm

I believe loss functions are always averaged over the batch dimension.

stephenjohnson · September 14, 2018, 11:00pm

I’m probably reading it too literally when you say “Let’s check the loss and compare to what we got earlier.” I read that to mean that the loss should be computed on the “exact” same batch of xb, yb for comparison purposes.

stephenjohnson · September 15, 2018, 1:46am

I had mentioned about 2 months ago about possibly adding accuracy into this notebook when it was in its earliest stages. I still think it would help the reader to see the accuracy as well as the loss to see how that improves as the loss improves and to understand how the predictions correlate to the target values. It seems like that discussion is somewhat missing from the notebook. Others may not agree. Anyway, I’ve gone ahead and submitted a PR with a sample of what I’m thinking.

jeremy · September 15, 2018, 1:55am

Many thanks, I had forgotten this.

nok · September 15, 2018, 11:26am

wow, I was just going to study PyTorch more before the next part1, will definitely check this out.

JensF · September 15, 2018, 6:13pm

This is a great tutorial! Here’s my feedback:

PyTorch will even create fast GPU or vectorized CPU code for your function automatically.

This left me wondering how this magic works. Is there a reference for further reading we could add here?

Section Switch to CNN
For people unfamiliar with torch.Tensor.view and Conv2d, would it be helpful to explain the basic mechanics of how the data flows here? E.g. The input channel of size 1 in the first Conv2d layer is created by reshaping the data using xb.view(-1,1,28,28) into image data of size 28x28 with 1 color channel

Typos:

If you’re lucky enough to have access to a CUDA-capable CPU

I guess this should say GPU?

we will first train basic neural net

Missing an “a” in the sentence

nok · September 15, 2018, 7:31pm

Quick feedback:

I like the first notebook, I think it is a good starting point and demystify the high level fastai library. I think this will make people have a general idea of PyTorch and when they read code outside of fastai community, they will understand code in a more “raw” PyTorch code.
Just notice fastai_v1\doc\fastai_full.svg, is it generated by some library?
I would like to make a small PR to start with, but I stuck in the tools/build part, I am using Window 10, it seems like the path is not correctly joined. I have tried following the steps in contribute.md, would love any help here, thank you!

(I think in line 16 for tools/build.py, the file should be a relative path but it return absolute path instead and causing the error, not sure if it is Window specific?..)
for file in sorted(path.glob('0*ipynb')):

I am able to fix the path by changing it by either

for file in sorted(glob.glob('0*ipynb')): # Not sure is it a difference between pathlib.Path.glob and glob.glob
set path = Path(.)

Note that I am running tools/build from the fastai_v1 directory with git Bash

jeremy · September 15, 2018, 11:45pm

@nok do you mind asking 2) and 3) over in #fastai-dev and also tagging @stas (who wrote the tools/build thing). You’ll also find a thread there discussing that tool (which is entirely optional for getting started BTW).

stephenjohnson · September 16, 2018, 2:08am

One thing I’ve been wondering about this notebook and in subsequent notebooks is that since the log_softmax is removed from the neural network and replaced by the use of F.cross_entropy in the loss function, should it be mentioned somewhere either in this notebook or some subsequent notebook that if the prediction percentages were desired in order to see how confident the prediction was, the outputs would need to be run through the softmax function. Something like the code below for this notebook. I haven’t seen this discussed so far in the notebooks I’ve gone through but I haven’t gone through all of them so perhaps this is discussed in a later one. Or perhaps I’ve not got any of this correct and am misunderstanding. Thanks for any clarification.

F.softmax(preds, dim=1)

jeremy · September 16, 2018, 4:11am

For sure. We haven’t started on the whole “model interpretation” stuff yet, so this will go there. I don’t think it need go in this initial pytorch tutorial however.

Many thanks for the suggestion.

nok · September 16, 2018, 11:20am

Thanks for pointing.

pl3 · September 19, 2018, 7:02pm

This is excellent, excellent stuff. Here are some (mostly nit-picky) things I noticed:

In Refactor using Dataset, we have 4 lines replaced, not just the last 2. There is also a weird indent.
```
 start_i = i*bs
 end_i = start_i+bs
 xb = x_train[start_i:end_i]
 yb = y_train[start_i:end_i]
```
In Refactor using DataLoader, there is a ... in the sample code being replaced, which isn’t used in other examples.
In Add validation, it might be worthwhile to add a reference/example for why we use model.train() and model.eval(). There is a quora link just above on why shuffling is important, could include something like that for batch norm.
In Switch to CNN, xb.view() is first introduced, but not explained until the nn.Sequential section. Might make more sense to add that explantation in the same section that Conv2d is explained.
In Switch to CNN, First try and nn.Sequential, this code block is present:
```
 xb, yb = next(iter(valid_dl))
 loss_func(model(xb), yb)
```
Correct me if wrong here, but this is just to show that the model is working on a single batch? That isn’t explained and seems out of place to me.
In Using your GPU, there is a typo: CUDA-capable CPU instead of GPU

pl3 · September 19, 2018, 7:24pm

wgpubs:

If so, might be nice to include that in the markdown and also print out the version a la …
import torch
print(torch.__version__)
I can already see myself wondering why things aren’t working only to find out later I’m using an unsupported version of pytorch.

I ran into a problem with torch version 0.3.1.post2 and this line:

x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))

Updating to 0.4.1 fixed it, but just wanted to second the idea of having the torch version included.

xyz · September 23, 2018, 10:35am

Hi @jeremy ,
I think it’s really really good on the “all you need for your understanding part”.
But one thing I’m missing (like in most tutorials) is the “what to do if again something has gone wrong in your model” part. I don’t find traces and debugging helpful. What I then do is logging stuff like that:

from copy import deepcopy 
# helpers
def module_pre_hook(m, side_effect):
    new_m = deepcopy(m)
    new_m.register_forward_pre_hook(side_effect)
    return new_m

def module_pre_hook_rec(m, side_effect):
    new_m = deepcopy(m)
    return new_m.apply(lambda m: m.register_forward_pre_hook(side_effect))

def log_side_effect(m, input):
        x, *y = input
        print('\n'.join(map(str, [x.size(), m])))
 
# logger for modules
def log_module(m, side_effect=log_side_effect):
    return module_pre_hook(m, side_effect=side_effect)

def log_module_rec(m, side_effect=log_side_effect):
    return module_pre_hook_rec(m, side_effect=side_effect)

def log_sequential(s, side_effect=log_side_effect):
    return nn.Sequential(*[log_module(m, side_effect) for  m in s.children()])

Which shows you immediately whats wrong with your shapes. (Or if your resnet is working on [512, 1, 1] Images ; ) )

s = nn.Sequential(
    nn.Linear(4, 4),
    nn.Linear(3, 6),
    nn.Linear(6, 4)
)

logged = log_sequential(s)
logged(xb)

# torch.Size([10, 4])
# Linear(in_features=4, out_features=4, bias=True)
# torch.Size([10, 4])
# Linear(in_features=3, out_features=6, bias=True)
# RuntimeError etc

Maybe that would be helpful for beginners.

Cheers

jeremy · September 23, 2018, 4:49pm

I strongly suggest folks use the debugger for this FYI - but either way it’s out of scope for this tutorial (and I’m glad you’ve found a hook-based solution that works for you; you might like the new Hook functionality in fastai v1!)

elpa · September 23, 2018, 8:49pm

This is the most complete pytorch tutorial I’ve seen to date.

The only add…I would love to see a section on how to efficiently manage and compare configurations.

jeremy · September 24, 2018, 2:58am

What do you mean by “configurations”?