Nowadays it’s often just at the end. But it varies a lot.
CW: Slightly more mathy stuff
Here’s a LOVELY video from one of my (other) favorite educators, Steve Brunton, on PCA and SVD. In this video in particular he covers an elegant answer to the question I’ve always struggled with:
If you can choose any number of principal components keep, where do you truncate (and why)?
Don’t be too afraid of his heavy use of notation–he always tries to explain things verbally and diagrammatically as well.
Thanks for awesome course which was a delight to follow along for the past few months .
Looking forward to Part 2!
Thanks a lot for the amazing content.
It’s been my (probably) 4th time watching the course.
I never cease to learn something new.
I am incredibly thankful for that.
Same to me, no matter what’s your level or what you’re doing: there is always something new to learn and a different point of view on concepts you thought to know
Now I really can’t wait for Part 2 though.
It’s like watching a TV show and being left at the end of a season on a cliff-hanger.
I need to know what happens next!!!
Honestly, this is something you have to try.
It’s the same as “Why do we stack 4 Conv layers on top of each other instead of 2?”.
The bigger the embedding the more the opportunity for the layer to learn something more nuanced.
The lower the interpretability though.
Thank you, Jeremey (and TAs) for a great course and the patience in answering all of my beginner questions throughout. I really enjoyed the course and would love to stick around to learn more. I will review the videos and walkthroughs before Part 2 so I can still tag along!
As usual, due to work clashes I could not watch the lesson 8. Luckily one of the questions I wanted to ask in the AMA.
I would be REALLY interested if Jeremy shared more of his research on teaching (esp. to kids). Even just a list of tools and resources to look at would be great. I have briefly tried looking into research and honestly I don’t even know where to start (and I am supposed to be a former academia researcher )
03:51 What will part 2 feel like? a lot deeper technically? Able to read and implement research papers? Models involve real life situations?
04:46 Review build a neuralnet from scratch. How Pytorch create a neuralnet effortlessly? How Pytorch keep track of model weights through Module
? How does Module
store weights with nn.Parameter
? How to check weights from the model using parameters()
?
You can build a layer in Module with nn.Linear
without nn.Parameter
and Pytorch can read weights from it too.
07:42 How to create the Embedding
function and the entire DotProductBias
with pytorch using create_params
from scratch? After it’s trained, the trained movie_bias can be checked. You can check the shape of the bias by model.movie_bias.shape
11:29 questions: What does Tensor.normal_
do?
12:21 After training, what can the movie_bias
tell us about each and all the movies? What does having a low bias mean for a movie? What does having a high bias mean for a movie? Can user_bias tell us which user just loves movies even the crapy ones? This is visualizing movie_bias
15:53 What can we interpret or do about the huge matrix with shape (num_users, 50)
? How to shrink the 50 latent factors into just 3 most important factors with pca
?
How to interpret the PCA chart of movies rated with only just two PCA factors of out 3 compressed by 50 factors? How the taste or style of the movies are condensed into two factors and displayed and defined by the location of the two dimensional chart? This is visualizing movie_factors or embeddings.
18:06 How fastai makes all the work above easier with just one line of code?
19:57 How fastai construct everything under the hood of collab_learner
?
21:14 Questions: is PCA useful in other applications? Where to find more of PCA? Why should you take Rachel’s Computational Linear Algebra?
22:11 How to use Embedding distance to find out movie similarities?
23:47 Go to read the fastbook for boostrapping a collaborative filtering model
24:22 How to do collaborative filtering with deep learning instead of matrix completion with dot product above? How to apply the easist neuralnet model architecture onto this collaborative filtering case?
26:14 How does fastai use rules of thumb to recommend the number of latent factors for users and movies?
How does fastai use deep learning to build collaborative filtering model in two ways?
27:48 Why the deep learning versions are not as good as DotProduct version? Is it because DotProduct is more tailored to the problem? How do companies combine both versions to do collaborative filtering? When you have lots of metadata, should you apply deep learning to it? How would you use metadata in the model?
28:49 Questions: Can a smaller number of users and movies overwhelm everybody else? e.g., a small group of anime enthusiasts watch a lot of anime movies and give super high ratings. Details of how to deal with them won’t be discussed here
30:25 How to apply embedding matrix into NLP model through a spreadsheet demo? What’s the essense of neuralnet?
34:56 How to apply embeddings to tabular dataset and models? How to understand TabularModel
and tabular_learner
source?
39:35 What’s going on inside a neuralnet through a shop sale prediction kaggle competition and a paper published based on it?
Entity Embeddings of Categorical variables (paper)
44:33 So far we have looked at what goes in as inputs and what goes out of a model as outputs. We have also looked at the middle as matrix multiplication. What are convolution (a particular kind of matrix multiplication in the middel)? How is it be very useful to CV? Why MNIST is one of the most famous CV dataset? How does Jeremy apply what Fergus and Zeiler’s paper onto MNIST using excel and convolution?
49:22 How to understand convolution? What does a filter do and How does it help to detect horizontal and vertical edges? How to determine the size of the filter or kernel (3x3, or 5x5, or any)? conv1 means the first convolutional layer
54:48 moving onto the second convolutional layer. Two filters give us two channels on the first convolutional layer. On the second convolutional layer, we create one 3D matrix filter which has two matrix filters to filter/process the two channels out of the first conv layer, and condense the value. And we can also create a second channle for the 2nd conv layer using another 3D filter.
57:07 How to determine the output and use SGD to train the model and optimize the filters?
58:00 What is maxpooling? What’s the problem of maxpooling? How much data do we lose? Why it is a good thing? What is a dense layer and what does it do?
1:00:48 How we do convolution slightly differently today? What is stride-two convolution and how does it work? (no more maxpooling) Then we do a lot of stride-two convolutions until the size shrinked to 7x7 and then do a average_pooling (no more dense layer). What does the 7x7 grid and take an average mean? What is the problem of such approach? When is the good time to use maxpool instead? How fastai made it easy for us to try both pooling by inventing a technique called concat_pooling to maxpool and average_pool and concat them together?
1:05:12 How to understand convolution in terms of matrix multiplications?
1:08:21 What is dropout and how to understand it using excel? What is droput mask? What’s its effect visually on excel? How to understand dropout as data augmentation for the activations? How does it help avoid overfitting? What’s the story of dropout and academia?
1:14:27 Why Jeremy not spend much time on activation functions? We have seen many functions on metrics, loss and activations.
1:16:27 What to do next before fastai part2? What Radek’s book meta learning is about? What are the things to do in Write, Help, Gather and Build?
1:19:42 a fastai community member published mish activation used by many state of art models.
1:20:41 Jeremy AMA:
How to keep up? To keep up by focusing in subfield of deep learning and focusing on things that don’t change much as the foundations of fastai have not changed much from 5 years ago. Everything else is just tweaks.
Will huge dataset and GPU computation replace us with small dataset and one gpu? There is always smarter ways of doing things, eg. Fastai team trained on imagenet on standard GPU faster than all companies with huge amount of GPUs. Pick areas of different domains which smaller models can beat the state of the art.
1:26:24 How Jeremy to teach kids math? all kids can learn algebra with dragonbox5+. Great, Jeremy promised to talk more about teaching kids some point later.
1:28:30 Plans for walkthrus
1:30:00 How to turn a model into business? Great news, Jeremy plans to build a course on this! What is the start of a business? What is the first step? How to gradually figure out whether your idea has a real need from people?
1:32:50 How Jeremy stay so efficient at working? Finish something nicely, tenacity
Consider doing the previous part 2 course linked to the fast mooc home page, in that way you may gauge what has change in the intervening years when you do part2 2022/2023. I believe the previous part 2 still has value, as part1 2022 has already prepared you for it
Totally! I actually already did part 2 of the first MOOC. It was long time ago though so I expect Jeremy to cook up a lot of new fun stuff
Perhaps that idea is not good for a newbie to fastai as there may be more effort required to run note books along with the videos.
The main problem perhaps in running through the last Part2 is the mismatch in versions of fastai between then and now, as well as python version used at that time. Could be a nightmare to run part2 code and get it working, however some knowledge can be gained from just watching the videos and perhaps applying concepts to fastbook notebooks .
Quick question, I was going through collaborative filtering deep dive and trying out the code.
So I created dataloader from the dataset (URLs.Ml_100k), custom model (dot product)
and trained the learner (fit_one_cycle).
Now I want to predict with the trained model. Can someone point me to any documentation/step on how to do this with collaborative filtering?
I think I found a small editorial error: In the spreadsheet conv-examples.xlsx, and sheet conv-example (dropout), the cell AH1, should read “Conv1 (26 x 26)” and not “Conv1 (27 x 27)”.
Similarly for cell BO1, should read: “Conv1 (24 x 24)” and not “Conv1 (26 x 26)”.
And also cell FC1, should read: “Conv2 (24 x 24) - after dropout” in stead of “Conv2 (26 x 26) - after dropout”.
How fun the last 4 weeks were for me!
Today, I finished up all the lessons that included a few further research questions and re-running all the notebooks discussed in the lectures and for a few of them, running them on a dataset of my choice.
The four weeks resulted in many blogs that I published here: Musings of Learning Machine Learning
and many Kaggle notebooks Doktor Glas | Notebooks Contributor | Kaggle
After a couple of weeks’ break, I will watch the videos again and this time, I will try to recreate the notebooks without “much” help.
So many gaps in my knowledge were filled and I hope I will be able to crystallise my understanding further in the next iteration.
PS: I didn’t know where to post this note, if there is a better place where it will be suited more then please direct me.
Thanks!
I have a question:
I used the MNIST_TINY dataset from fastai, which has a few hundred images for training and 20 for testing. I set my batch size to 4, and still got “Your generator is empty” warning. The printout of my validation_loss
and accuracy
are both None. Any idea why?
path = untar_data(URLs.MNIST_TINY)
Path.BASE_PATH = path
mnist_db = DataBlock(
blocks=(ImageBlock(cls=PILImageBW), CategoryBlock),
get_items=get_image_files,
splitter=GrandparentSplitter('train', 'test'),
get_y=parent_label,
batch_tfms=Normalize() # normalize within each batch
)
batch_size = 4
dls = mnist_db.dataloaders(path, bs=batch_size)
def conv(ni, nf, ks=3, act=True):
res = nn.Conv2d(ni, nf, stride=2, kernel_size=ks, padding=ks//2)
if act: res = nn.Sequential(res, nn.ReLU())
return res
def simple_cnn():
return sequential(
conv(1 ,8, ks=5), #14x14
conv(8 ,16), #7x7
conv(16,32), #4x4
conv(32,64), #2x2
conv(64,10, act=False), #1x1
Flatten(),
)
def fit(epochs=1):
learn = Learner(dls, simple_cnn(), loss_func=F.cross_entropy,
metrics=accuracy, cbs=ActivationStats(with_hist=True))
learn.fit(epochs, 0.06)
return learn
learn = fit()
Instead of test
use valid
. The test set has no labels (folders) so your validation set is effectively empty which is why you’re getting the generator is empty warning and the validation loss and accuracy are None as those only display for the validation set. It would probably be better if this threw an exception rather than a warning.
This doesn’t throw an error, but the mnist tiny dataset only has 3’s and 7’s so you should probably change the 10
to a 2
so your model output shape matches the # of classes.
I don’t think the ActivationStats
works correctly with a single epoch. You get a warning when running a single epoch. If you change the # of epochs to 2 the warning goes away.
A few debugging tips: I found the main issue by running dls.valid.show_batch(). Running ‘show_batch’ (or one_batch
) on your dataloaders is always recommended. I confirmed my suspicion on model output shape by running learn.get_preds()[0].shape.
Thank you @matdmiller for your help and the debugging tip!
I generally am confused about how to debug when using FastAI and I ran into a new issue. I’ve finished training my model, and now I am trying to use it to make predictions for a single image. It works for a sample image in the test
folder, but when I try to upload an image it gives a list index out of range
error.
The whole notebook: Google Colab
The specific code snippet:
# This works
img = PILImageBW.create(f"{path}/test/3878.png")
learn.predict(img)
# This throws an error
btn_upload = widgets.FileUpload()
btn_upload
img = PILImageBW.create(btn_upload.data[-1])
img.resize((28, 28))
learn.predict(img)
I feel like it must have to do with the properties of the uploaded image vs. the image in the testing folder, but I can’t figure out what it is. Any pointers on how to debug this?