Lesson 3 official topic

prashantmdgl9 · August 31, 2022, 9:41pm

Thanks!
It all falls into place now.
It’s quite strange but interesting that despite forming everything correctly, it all boiled down to the loss function and the log did the trick.
Of course, the log made it easier for the computer to handle computations with very large or very small numbers.

Too many learnings in this chapter. I was thinking that I would be done in a day or two but it has taken more than a week or so and now I have started getting some grasp on what might be happening.

prashantmdgl9 · August 31, 2022, 9:56pm

Hi, Just putting it out here.

I was trying to understand what torch.where does (sadly the examples in the PyTorch docs don’t cover the case I was interested in). Here is a quick link, if anyone is interested - Torch.where | Musings of Learning Machine Learning

prashantmdgl9 · September 3, 2022, 6:35pm

Hi all, I am training a model on a dataset with 10 classes and around 120 images for each category. I kept a tab on what changes I was making and placed it in a table.
As I was changing the parameters and changing the pre-trained models, it felt like I was randomly doing this and that.
Is there a better approach to training and seeing improvements? Any pointers will be appreciated.

Model type	Learner	Data preprocessing	Loss	Data loading	Valid provided	Opt	Batch size	Other transformations	Image size in item transformation	Max_Accuracy
resnet34	Vision learner		default	Data block	Entire data in path and then random split	Default		Data augmentation, im_size = 224	300	55
convnext_small_in22k	Vision learner		default	Image data loaders	Train test separate in data loader	Default	64	Data aug, im_size = 224	300 Squish	60
swin_s3_small_224	Vision learner		default	Image data loaders	Train test separate in data loader	Default	32		224 Squish	58
convnext_tiny_hnf	Vision learner		default	Image data loaders	Train test separate in data loader	Default	32	Data aug, im_size = 300	400 Squish	59
convnext_base	Vision learner		default	Image data loaders	Train test separate in data loader	Default	32	Data aug, size not provided	128	61
convnext_small_in22k	Vision learner		default	Image data loaders	Train test separate in data loader	Default	32	Data aug, size not provided	224	56
convnext_small_in22k	Vision learner		default	Image data loaders	Train test separate in data loader	Default	64	Data aug, im_size = 224, imagent normalise	300	59
convnext_small_in22k	Vision learner	Preprocessed data, removed lossy class	default	Image data loaders	Train test separate in data loader	Default	32	Data aug, size not provided	128	70
convnext_small	Vision learner	Preprocessed data, removed lossy class	Cross Entropy Flattened	Image data loaders	Train test separate in data loader	Default	32	Data aug, size not provided	224	72.5
convnext_tiny	Vision learner	Preprocessed data, removed lossy class	Focal loss, gamma 1.5	Image data loaders	Train test separate in data loader	Default	32	Data Aug, im_size = 128	224	71.5

ypanagis · September 7, 2022, 7:14pm

Hi I am doing a perhaps more basic thing but wanted to get more into terms with the inner workings of deep learning.

Following the Excel example that Jeremy is showing, I am trying to see a bit the inner workings of Neural Networks. So I added in the Sheet, a third line of parameters, ReLU3, updated the loss function and ran the solver. The loss was reduced to 0.131.

My queston is: By adding yet another set of parameters and one more ReLU computation, does this correspond to having two intermediate layers, before the output is produced? I got the impression that the NN internally could look like:

dot(Input, Params1) -> Layer 1 -> dot(Layer1Output, Params2) -> Layer 2 -> dot(Layer2Output, Params3) -> output

where by dot above, I mean “dot product”. Does this approach the way it works, at all? Is it perhaps a new “epoch”, when we add some new parameters, or is the system so far unaware of how to properly adapt the parameters, in order to minimize loss?

My gist tells me that we are not adapting the parameters as when we were doing abc -= abc.grad*0.01, but wanted to cross-check .

Thank you in advance!

Prostab · September 11, 2022, 7:19am

I am facing the same issue when predicting the model locally.
"AttributeError: ‘GELU’ object has no attribute ‘approximate’ "
Have you found any solution for this?

jeremy · September 12, 2022, 9:38pm

You’ll need to downgrade to an older version of PyTorch.

chanon_s · September 13, 2022, 4:25pm

Thank you for this! I was very confused why the results wouldn’t converge towards the correct values even if I tried increasing steps or changing learning rate.

chanon_s · September 13, 2022, 5:18pm

Here is a notebook I wrote to help me understand the gradient descent code in the first part of the lesson:

Lesson 3: Gradient Descent Function Notebook on Google Collab
Lesson 3: Gradient Descent Function Notebook on Kaggle

I have packaged the gradient descent code into a single re-usable function, and tested it with various functions.

It displays the learning process in a graph and the notebook has a lot of interactable sliders that help illustrate how the parameters work.

It might be helpful to other students. NOTE though, I am a complete python and AI noob. So please forgive me if there is anything strange or nonstandard.

Here’s what the graphs look like once you run it:

Above [a, b, c] was [3, 2, 1] and gradient descent came up with [2.8793, 1.9189, 1.1948] from 30 steps with a learning rate of 0.05.

Same graph as above but I tweaked the parameters:

Even tried it with relu and double relu!:

Santhosh · September 14, 2022, 4:54am

Hi - I watched the video first and now I am trying to go execute the notebooks used by Jeremy in the video. However I am unable to find the exact notebook used in the video for Gradio pets classifier. The hugging spaces pets classifier code seems different to the one in the video and I’ve spent hours reading through the forum and I am still stuck. Could one of you please point me to the right place?

jeremy · September 14, 2022, 5:24am

@Santhosh every lesson has a list of links from the lesson:

Santhosh · September 14, 2022, 5:26am

Thanks Jeremy. I went through the links but I couldn’t find the code that you used in your video around the 8 min mark - pets breed detector. So I wanted to ensure I wasn’t missing something.

jeremy · September 14, 2022, 7:38pm

IIRC it’s this one:

deelight_del · September 23, 2022, 4:46am

Binary Cross Entropy.

I have gone through the whole video course multiple times and now I am going through the whole textbook. I am on chapter 6 and I noticed this formula concerning Binary Cross Entropy used for handling the loss of multi-label classification

def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, inputs, 1-inputs).log().mean()

My question is in the case of log 0, which is -inf, how do we handle such a case? A case where the target is 1 for instance and the inputs(prediction) is 0.

bencoman · September 23, 2022, 5:56am

Consider the output of the sigmoid can’t be 0, thus log(0) can never occur.

AllenK · September 29, 2022, 3:06am

Love how it has expanded over the years. Here’s the last one I saw.

There are only two hard problems in computer science:

Cache invalidation
Naming things
Asynchronous callbacks
Off-by-one errors
Scope creep
Bounds checking

scotty529 · October 4, 2022, 2:32am

Hi,

Is there a link to the Jupyter notebook that Jeremy is working in at the beginning of the video? The pet breed model to be exact. He posted a link to the hugging face space, but isn’t it better for us to run it as a notebook?

Thanks,
Scott

matdmiller · October 4, 2022, 3:35am

Hey @scotty529, The hugging face space link at the top takes you to the “Files and Versions” tab which is a repo. The notebooks are contained within the repo that you can clone/download and run wherever you want. You can’t run or view the notebooks easily in HF spaces. Alternatively maybe this is what you’re looking for:

shainis · October 4, 2022, 1:56pm

Hello, I’d like your help please.

I’m trying to replicate the beautiful Excel linear regression from the video to Python.
I’ve done all the steps until that magic that Excel’s “Solver” does.
What should I do in order to write that in Python?

Thanks!

(the following example uses only “Fare” for convenience)

matdmiller · October 5, 2022, 3:28am

Jeremy goes over how to do all of this in python in Lesson 5.

bahman_apl · October 7, 2022, 3:43am

Lecture 3 Summary
Google Doc View