Lesson 3 official topic

Jeremy, can you share your titanic excel spreadsheet for us to poke around?

It’s available here:

8 Likes

RE: the titantic dataset in excel - confused as I thought in deep learning the 2nd set of parameters would not be random (as was in excel) - did I miss something?

1 Like

We usually initialise parameters with random values prior to training

They only start random.

Question:
In general, if a machine learning problem could be either NLP or computer vision, which is the better route to go with - in general, and also from the fast.ai perspective? I know that one could also try out both ways to see which model(s) give the best results, etc, but like any general advice for this type of task?

1 Like

The problem domain will probably be the determinant here. If you’re working with text data, NLP is likely the natural choice. Computer vision can be more general – a surprising number of problems that are, say, tabular in nature can be transformed to a computer vision problem.

3 Likes

Ok thanks - didn’t realise that the nodes of each layer have parameters initialised with random values

1 Like

This is why transfer learning is so effective. Because instead of starting off with millions/billions of random parameters, each of which you need to optimize in order to be able to recognize patterns, you instead borrow from a model that has already set those weights near/at their optimal values. You then only update the last few layers for your specific task.

However all of today’s models started out with random parameters for all their layers at some point (either directly or indirectly by transfer learning themselves :slight_smile:

3 Likes

Thanks! Could you point to a chapter that talks OOP in Practical deep learning for coders?

I think this is a great suggestion @ilovescience but I always feel I get overwhelmed.

If one were to start digging through the code, where should one start? I can do just enough Python to write small scripts, so I feel like I don’t have the chops to pick up a codebase and sort of intuitively know which source file to start in which would be at the right abstraction level.

1 Like

If we have a way of explaining where the model is focusing in coming to a conclusion then we can use this fact to make it more accurate. For example two breeds which are almost same but differ prominently in the shape of the nose but we detect that the model is not focusing on nose then if we give more images where nose is visible clearly then model may improve.
Last Fastai course covered heatmap for detecting the focus, but that was not too fine grained to arrive at such a specific conclusion. In case a more refined heatmap can be produced, that can be an incredible tool for improving classification of similar looking classes.

1 Like

Well, its more of seeing how something is implemented. Then trying it out on your own. Chapter 17 A Neural Net from the Foundations.

Very thought provoking quesiont!

I’d go about it this way: if we look at how “human experts” differentiate between these rocks, we may find that their examination doesn’t just consist of visual comparison. Humans have multiple models running. One does visual classification, the other may compare weights (when the expert picks each equally sized rock), and the other might compare the sounds that each rock makes when they hit it with a hammer. They may break it and see how the crystal structure parts. They may then take out a magnifying glass and look at each sample a little closer.

All these features, when presented to a large enough model (or a collections of models that make parallel predictions) should give better classification. I’m a big believer of not using one giant function to do everything. Because none of the biological systems evolved this way. And there must be a reason as to why.

1 Like

Looks really interesting. Earlier I found some implementation of Bayesian neural Network which used customized dropout to generate the entropy score for each prediction. Entropy is supposed to be larger for untrained class or when there is no clear winning class. However I found that the performance was not very consistent.

You’ll have to build a dataloader out of the files that you’d like to infer, assign it to the test dataloader, then ask the learner to get predictions on it.

Following your code, I’d assume this to work. You might have to run same argmax things from the dataloader vocab to get back the exact labels.

learn = load_learner("file.pkl")
png_files = get_image_files("images")
test_dl = learn.dls.test_dl(png_files)
pred_prob, pred_index, decoded = learn.get_preds(dl=test_dl, with_decoded=True)

In fact, if you dig into the learn.predict, you’ll find that it uses get_preds under the hood with a test dataloader of just the 1 item that you’ve provided. You can use that function as a guide to customize further things.
https://github.com/fastai/fastai/blob/master/fastai/learner.py#L268-L269

3 Likes

I really loved that this section was included in the course today. Maybe in future lectures we could have some more (short) sections that explored a this topic a bit further, esp. in terms of how easily do these models fine tune to a different dataset.

I’d really like to use something more modern for quick runs/baselines, will try using convnext family models in the coming weeks.

5 Likes

In case someone is following along with the questions over on aiquizzes.com, it’s a bit counterintuitive but by the end of lesson 3 you should have answered the following sets of questions (listed in settings):

I think this is because the questions refer to an older version of the course, so the match isn’t identical. Once the course is done it might be worth compiling / sorting a list of ‘which questions came up in which (2022) class’ for when the course gets released and we can pass that list on to @radek for listing somewhere on the site?

4 Likes

Yeah, I agree with you.

I was very impressed with the convnext on my raptor model. I’m using WSL and have a 6G GTX1660 and found I needed to drop the image down to 160 from 192 to avoid CUDA issues (cudablas was the error - but went away with smaller images…) and even on smaller images the error rate dropped from 6.6% to 3.5% for the same training set compared to resnet.

2 Likes