Lesson 8 official topic

Nowadays it’s often just at the end. But it varies a lot.

1 Like

CW: Slightly more mathy stuff

Here’s a LOVELY video from one of my (other) favorite educators, Steve Brunton, on PCA and SVD. In this video in particular he covers an elegant answer to the question I’ve always struggled with:

If you can choose any number of principal components keep, where do you truncate (and why)?

Don’t be too afraid of his heavy use of notation–he always tries to explain things verbally and diagrammatically as well.

5 Likes

Thanks for awesome course which was a delight to follow along for the past few months :slight_smile: .

Looking forward to Part 2! :tada:

6 Likes

Thanks a lot for the amazing content.
It’s been my (probably) 4th time watching the course.
I never cease to learn something new.
I am incredibly thankful for that.

6 Likes

Same to me, no matter what’s your level or what you’re doing: there is always something new to learn and a different point of view on concepts you thought to know :wink:

4 Likes

Now I really can’t wait for Part 2 though.
It’s like watching a TV show and being left at the end of a season on a cliff-hanger.
I need to know what happens next!!! :laughing:

4 Likes

Honestly, this is something you have to try.
It’s the same as “Why do we stack 4 Conv layers on top of each other instead of 2?”.
The bigger the embedding the more the opportunity for the layer to learn something more nuanced.
The lower the interpretability though.

1 Like

Thank you, Jeremey (and TAs) for a great course and the patience in answering all of my beginner questions throughout. I really enjoyed the course and would love to stick around to learn more. I will review the videos and walkthroughs before Part 2 so I can still tag along!

3 Likes

As usual, due to work clashes I could not watch the lesson 8. Luckily one of the questions I wanted to ask in the AMA.

I would be REALLY interested if Jeremy shared more of his research on teaching (esp. to kids). Even just a list of tools and resources to look at would be great. I have briefly tried looking into research and honestly I don’t even know where to start (and I am supposed to be a former academia researcher :man_facepalming: )

2 Likes

03:51 What will part 2 feel like? a lot deeper technically? Able to read and implement research papers? Models involve real life situations?

04:46 Review build a neuralnet from scratch. How Pytorch create a neuralnet effortlessly? How Pytorch keep track of model weights through Module? How does Module store weights with nn.Parameter? How to check weights from the model using parameters()?

Images

You can build a layer in Module with nn.Linear without nn.Parameter and Pytorch can read weights from it too.

Images

07:42 How to create the Embedding function and the entire DotProductBias with pytorch using create_params from scratch? After it’s trained, the trained movie_bias can be checked. You can check the shape of the bias by model.movie_bias.shape

Images



11:29 questions: What does Tensor.normal_ do?

Images


12:21 After training, what can the movie_bias tell us about each and all the movies? What does having a low bias mean for a movie? What does having a high bias mean for a movie? Can user_bias tell us which user just loves movies even the crapy ones? This is visualizing movie_bias

Images

15:53 What can we interpret or do about the huge matrix with shape (num_users, 50)? How to shrink the 50 latent factors into just 3 most important factors with pca?

Images

How to interpret the PCA chart of movies rated with only just two PCA factors of out 3 compressed by 50 factors? How the taste or style of the movies are condensed into two factors and displayed and defined by the location of the two dimensional chart? This is visualizing movie_factors or embeddings.

Images

18:06 How fastai makes all the work above easier with just one line of code?

Images

19:57 How fastai construct everything under the hood of collab_learner?

Images


21:14 Questions: is PCA useful in other applications? Where to find more of PCA? Why should you take Rachel’s Computational Linear Algebra?

22:11 How to use Embedding distance to find out movie similarities?

Images

23:47 Go to read the fastbook for boostrapping a collaborative filtering model

24:22 How to do collaborative filtering with deep learning instead of matrix completion with dot product above? How to apply the easist neuralnet model architecture onto this collaborative filtering case?

Images

26:14 How does fastai use rules of thumb to recommend the number of latent factors for users and movies?

Images

How does fastai use deep learning to build collaborative filtering model in two ways?

Images

27:48 Why the deep learning versions are not as good as DotProduct version? Is it because DotProduct is more tailored to the problem? How do companies combine both versions to do collaborative filtering? When you have lots of metadata, should you apply deep learning to it? How would you use metadata in the model?

28:49 Questions: Can a smaller number of users and movies overwhelm everybody else? e.g., a small group of anime enthusiasts watch a lot of anime movies and give super high ratings. Details of how to deal with them won’t be discussed here

30:25 How to apply embedding matrix into NLP model through a spreadsheet demo? What’s the essense of neuralnet?

Images

34:56 How to apply embeddings to tabular dataset and models? How to understand TabularModel and tabular_learner source?

39:35 What’s going on inside a neuralnet through a shop sale prediction kaggle competition and a paper published based on it?

Entity Embeddings of Categorical variables (paper)

44:33 So far we have looked at what goes in as inputs and what goes out of a model as outputs. We have also looked at the middle as matrix multiplication. What are convolution (a particular kind of matrix multiplication in the middel)? How is it be very useful to CV? Why MNIST is one of the most famous CV dataset? How does Jeremy apply what Fergus and Zeiler’s paper onto MNIST using excel and convolution?

49:22 How to understand convolution? What does a filter do and How does it help to detect horizontal and vertical edges? How to determine the size of the filter or kernel (3x3, or 5x5, or any)? conv1 means the first convolutional layer

54:48 moving onto the second convolutional layer. Two filters give us two channels on the first convolutional layer. On the second convolutional layer, we create one 3D matrix filter which has two matrix filters to filter/process the two channels out of the first conv layer, and condense the value. And we can also create a second channle for the 2nd conv layer using another 3D filter.

57:07 How to determine the output and use SGD to train the model and optimize the filters?

58:00 What is maxpooling? What’s the problem of maxpooling? How much data do we lose? Why it is a good thing? What is a dense layer and what does it do?

1:00:48 How we do convolution slightly differently today? What is stride-two convolution and how does it work? (no more maxpooling) Then we do a lot of stride-two convolutions until the size shrinked to 7x7 and then do a average_pooling (no more dense layer). What does the 7x7 grid and take an average mean? What is the problem of such approach? When is the good time to use maxpool instead? How fastai made it easy for us to try both pooling by inventing a technique called concat_pooling to maxpool and average_pool and concat them together?

1:05:12 How to understand convolution in terms of matrix multiplications?

1:08:21 What is dropout and how to understand it using excel? What is droput mask? What’s its effect visually on excel? How to understand dropout as data augmentation for the activations? How does it help avoid overfitting? What’s the story of dropout and academia?

1:14:27 Why Jeremy not spend much time on activation functions? We have seen many functions on metrics, loss and activations.

1:16:27 What to do next before fastai part2? What Radek’s book meta learning is about? What are the things to do in Write, Help, Gather and Build?

1:19:42 a fastai community member published mish activation used by many state of art models.

1:20:41 Jeremy AMA:

How to keep up? To keep up by focusing in subfield of deep learning and focusing on things that don’t change much as the foundations of fastai have not changed much from 5 years ago. Everything else is just tweaks.

Will huge dataset and GPU computation replace us with small dataset and one gpu? There is always smarter ways of doing things, eg. Fastai team trained on imagenet on standard GPU faster than all companies with huge amount of GPUs. Pick areas of different domains which smaller models can beat the state of the art.

1:26:24 How Jeremy to teach kids math? all kids can learn algebra with dragonbox5+. Great, Jeremy promised to talk more about teaching kids some point later.

1:28:30 Plans for walkthrus

1:30:00 How to turn a model into business? Great news, Jeremy plans to build a course on this! What is the start of a business? What is the first step? How to gradually figure out whether your idea has a real need from people?

1:32:50 How Jeremy stay so efficient at working? Finish something nicely, tenacity

7 Likes

Consider doing the previous part 2 course linked to the fast mooc home page, in that way you may gauge what has change in the intervening years when you do part2 2022/2023. I believe the previous part 2 still has value, as part1 2022 has already prepared you for it

1 Like

Totally! I actually already did part 2 of the first MOOC. It was long time ago though so I expect Jeremy to cook up a lot of new fun stuff

Perhaps that idea is not good for a newbie to fastai as there may be more effort required to run note books along with the videos.
The main problem perhaps in running through the last Part2 is the mismatch in versions of fastai between then and now, as well as python version used at that time. Could be a nightmare to run part2 code and get it working, however some knowledge can be gained from just watching the videos and perhaps applying concepts to fastbook notebooks .

Quick question, I was going through collaborative filtering deep dive and trying out the code.
So I created dataloader from the dataset (URLs.Ml_100k), custom model (dot product)
and trained the learner (fit_one_cycle).
Now I want to predict with the trained model. Can someone point me to any documentation/step on how to do this with collaborative filtering?

I think I found a small editorial error: In the spreadsheet conv-examples.xlsx, and sheet conv-example (dropout), the cell AH1, should read “Conv1 (26 x 26)” and not “Conv1 (27 x 27)”.
Similarly for cell BO1, should read: “Conv1 (24 x 24)” and not “Conv1 (26 x 26)”.
And also cell FC1, should read: “Conv2 (24 x 24) - after dropout” in stead of “Conv2 (26 x 26) - after dropout”.

How fun the last 4 weeks were for me!
Today, I finished up all the lessons that included a few further research questions and re-running all the notebooks discussed in the lectures and for a few of them, running them on a dataset of my choice.

The four weeks resulted in many blogs that I published here: Musings of Learning Machine Learning
and many Kaggle notebooks Doktor Glas | Notebooks Contributor | Kaggle

After a couple of weeks’ break, I will watch the videos again and this time, I will try to recreate the notebooks without “much” help.

So many gaps in my knowledge were filled and I hope I will be able to crystallise my understanding further in the next iteration.

PS: I didn’t know where to post this note, if there is a better place where it will be suited more then please direct me.

Thanks!

1 Like

I have a question:
I used the MNIST_TINY dataset from fastai, which has a few hundred images for training and 20 for testing. I set my batch size to 4, and still got “Your generator is empty” warning. The printout of my validation_loss and accuracy are both None. Any idea why?

path = untar_data(URLs.MNIST_TINY)
Path.BASE_PATH = path

mnist_db = DataBlock(
    blocks=(ImageBlock(cls=PILImageBW), CategoryBlock), 
    get_items=get_image_files, 
    splitter=GrandparentSplitter('train', 'test'),
    get_y=parent_label,
    batch_tfms=Normalize()   # normalize within each batch
)
batch_size = 4
dls = mnist_db.dataloaders(path, bs=batch_size)

def conv(ni, nf, ks=3, act=True):
    res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)
    if act: res = nn.Sequential(res, nn.ReLU())
    return res

def simple_cnn():
    return sequential(
        conv(1 ,8, ks=5),        #14x14
        conv(8 ,16),             #7x7
        conv(16,32),             #4x4
        conv(32,64),             #2x2
        conv(64,10, act=False),  #1x1
        Flatten(),
    )

def fit(epochs=1):
    learn = Learner(dls, simple_cnn(), loss_func=F.cross_entropy,
                    metrics=accuracy, cbs=ActivationStats(with_hist=True))
    learn.fit(epochs, 0.06)
    return learn

learn = fit()

Instead of test use valid. The test set has no labels (folders) so your validation set is effectively empty which is why you’re getting the generator is empty warning and the validation loss and accuracy are None as those only display for the validation set. It would probably be better if this threw an exception rather than a warning.

This doesn’t throw an error, but the mnist tiny dataset only has 3’s and 7’s so you should probably change the 10 to a 2 so your model output shape matches the # of classes.

I don’t think the ActivationStats works correctly with a single epoch. You get a warning when running a single epoch. If you change the # of epochs to 2 the warning goes away.

A few debugging tips: I found the main issue by running dls.valid.show_batch(). Running ‘show_batch’ (or one_batch) on your dataloaders is always recommended. I confirmed my suspicion on model output shape by running learn.get_preds()[0].shape.

2 Likes

Thank you @matdmiller for your help and the debugging tip!
I generally am confused about how to debug when using FastAI and I ran into a new issue. I’ve finished training my model, and now I am trying to use it to make predictions for a single image. It works for a sample image in the test folder, but when I try to upload an image it gives a list index out of range error.

The whole notebook: Google Colab

The specific code snippet:

# This works
img = PILImageBW.create(f"{path}/test/3878.png")
learn.predict(img)

# This throws an error
btn_upload = widgets.FileUpload()
btn_upload
img = PILImageBW.create(btn_upload.data[-1])
img.resize((28, 28))
learn.predict(img)

I feel like it must have to do with the properties of the uploaded image vs. the image in the testing folder, but I can’t figure out what it is. Any pointers on how to debug this?