New pytorch tutorial draft - feedback welcome

jeremy · September 11, 2018, 5:26pm

Hi gang. We’ve written a little Pytorch tutorial which, if approved, may become an official pytorch v1 tutorial once it’s released. So I want to make sure it’s pretty good! Let me know if you have any thoughts about how to make this better:

github.com

fastai/fastai_v1/blob/master/dev_nb/001a_nn_basics.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# What is `torch.nn` *really*?\n",
    "\n",
    "## A quick journey: from neural net \"from scratch\", to fully utilizing `torch.nn`, `Dataset`, and `DataLoader`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*by Jeremy Howard, fast.ai. Thanks to Rachel Thomas and Francisco Ingham.*\n",
    "\n",
    "PyTorch provides elegantly designed modules and classes `torch.nn`, `Dataset`, and `DataLoader` to help you create and\n",
    " train neural networks. In order to fully utilize their power and customize them for your problem, you need to really understand exactly what they're doing. To develop this understanding, we will first train basic neural net on the MNIST data set without using any features from these models; we will initially only use the most basic PyTorch tensor functionality. Then, we will incrementally add one feature from `torch.nn`, `Dataset`, and `DataLoader` at a time, showing exactly what each piece does, and how it works to make the code either more concise, or more flexible.\n",
    "\n",

This file has been truncated. show original

nirantk · September 12, 2018, 12:50am

This is fantastic @jeremy and team!

It is lucid, written with a beginner in mind and easy to follow.

You do describe DataLoaders here:

Pytorch's DataLoader is responsible for managing batches. You can create a DataLoader from any Dataset.

DataLoader makes it easier to iterate over batches. Rather than having to use train_ds[i*bs : i*bs+bs], the DataLoader gives us each minibatch automatically.

I’d love a quick mention on why automated minibatches are better e.g:

Can this help train on datasets which don’t fit in memory?
Can this improve GPU utilization? By having GPU not wait on CPU for I/O?
Isn’t an array lookup O(1) - why should I use this instead?

Maybe having a few images is quite helpful to mentally prepare on what’s next?
Example image:

This is from an Adam Geitgey tutorial on NLP.

wgpubs · September 12, 2018, 4:16am

Is there an environment.yml or requirements.txt for us to ensure we are using appropriate versions of the various python packages?

qwertimer · September 12, 2018, 6:19am

I just ran through this just before. Apart from a few wording typos the work is great. I am going to directly implement these tools in my work. Thank you so much for this.

KarlH · September 12, 2018, 5:48pm

This is great.

I don’t know if this is within the scope of what you’re envisioning for the document but I think it would be useful to show how to pull weights, gradients and activations out of the model.

TheShadow29 · September 12, 2018, 6:17pm

This is an amazing tutorial. Nothing much to add except for a few minor points.

We can remove import math and use np.sqrt instead.
In Mnist Logistic, have self.lin = nn.Linear(784, 10, bias=True). to specifically show that bias is added.
In addition to nn.Sequential, have a small section on nn.ModuleList. nn.ModuleList is pretty amazing when you want to repeat the same type of layer, but don’t want to define it multiple times.
I am not sure if I am understanding the preprocess part correctly. It seems that the preprocess stuff is making the image into (B x 1 x 28 x 28). However, if the image is of size say 56 x 56, it should make it B x 1 x 56 x 56 right? Currently it will make it (4B x 1 x 28 x 28). And I am not very sure if that is the intended. If it is, a little more explanation would be appreciated (I think).

Things I really loved:

Lambda Module. While I never used it before, it was nice to see some examples.
Wrapped Data Loader. Again found this pretty fantastic. There are cases where I had to resort to some adhoc methods, but having it in the Wrapped Data Loader itself, would be better.
Simple definition of NLL loss. I would have probably written the same stuff with an additional 2-3 wasteful lines.
The refactoring stuff and going bottom-up is a lovely idea.

jeremy · September 12, 2018, 10:33pm

MNIST data is a single 28*28=768 long vector. We convert it in to an image.

That was thanks to @radek

jeremy · September 12, 2018, 10:35pm

I’m not sure how Facebook is planning to host this - I’ll try to make sure that there’s something suitable.

jeremy · September 12, 2018, 10:35pm

It’s simply a refactoring which reduces the lines of code we have to write.

raghavab1992 · September 13, 2018, 8:01am

This looks really good. Covers all the basic essentials for building neural networks with pytorch in a very structured and thoughtful manner.
A few suggestions that might improve the understanding further:

Can the various torch modules being used/explained in the notebook (torch.nn, torch.nn.functional, torch.autograd, torch.optim, torch.utils etc) be introduced initially with a brief overview in a separate section. This gives an overall idea of what imp modules are available in pytorch, what they do and what is being covered in this notebook etc.
The one that @nirantk already mentioned about data loaders. To be more specific, can the usage of generator inside dataloader be explained as well since it helps with very imp things like reading large datasets
This might be an overkill but let me put it anyway. Can we add how to handle variable sized inputs using a data loader. Knowing how this can be done comes in handy in certain scenarios, for e.g. during test time where we would like to predict on images with different sizes without cropping, doing pixel wise classification where scaling back on predictions is not desired.
A discussion thread on pytorch forum regarding the same

Krisztian · September 13, 2018, 8:29am

I love it! It came at a perfect time – I am trying reconstruct the recommendation systems from the lectures using only pytorch, and this notebook really helped me understand datasets/dataloaders.

One question:

When including validation, you add model.train() and model.eval() to the code. I looked at the documentation and it seems to turn things like dropout on/off. I don’t understand what it is doing in this context though.

jeremy · September 13, 2018, 1:39pm

Yup basically it’s just for batchnorm and dropout (or any other layer types that have different behaviour at train vs test time).

jeremy · September 13, 2018, 1:40pm

That’s a good idea. Maybe I’ll add a summary at the end.

msp · September 13, 2018, 1:43pm

I think it’s almost too compact – a tutorial reader would definitely benefit from a short comment of this snippet.

msp · September 13, 2018, 4:10pm

This tutorial is really excellent! I’ve looked at a few recently, and none even came close in terms of discussing the concepts step by step.

In this spirit, I think there should be a comment about the first uses of model.train() and .eval().

The only think I dislike about the tutorial is the loss_batch function, which to me makes the code less clean/readable, but that’s just personal taste.

jeremy · September 13, 2018, 8:24pm

If it were a DL tutorial, I’d certainly agree - but I’m trying to keep this to just being a pytorch tutorial, so really not explaining any implementation details like that.

jeremy · September 13, 2018, 8:42pm

OK I’ve added that now.

jeremy · September 13, 2018, 8:43pm

I’ve added a closing section with this summary now.

tcapelle · September 14, 2018, 11:26am

Excellent tutorial, just what I needed.

+1 to add a brief description of nn.ModuleList, just reading this comment helped me clarify some ideas.

jeremy · September 14, 2018, 7:19pm

It’s can be a useful class but I don’t think it belongs in this particular tutorial.

(BTW, I almost never use it any more - I find nn.Sequential(*list) nearly always better.)