Live coding 11

Oh wow, Sarada, this is wonderful! :partying_face: My first ever Kaggle Notebook gold! :1st_place_medal: Earned with the help of our amazing community! :heart: It couldn’t get better than this! :smile:

Thank you so very much for making this happen and my heartfelt thank you to everyone who contributed their votes :slightly_smiling_face:

And thank you so much for your very kind words! I continue to learn a lot from everyone here. It was wonderful to see you today in the walk-through today, Sarada!!! :slightly_smiling_face:

BTW what surprises me quite a bit is the number of medals in discussions. I can barely remember talking online that much :smile: But oh well, if there is a record to prove it, I guess it must have happened :slight_smile:

Thank you so very much again everyone for this wonderful surprise! :slight_smile: :heart:


Something peculiar I noticed on Paperspace today is that I was getting a very long error when creating the data loader with the same exact code that worked on a different machine yesterday.

I tried restarting the kernel but got the same error. Then I stopped the machine (P5000) and started a new RTX4000 machine and ran the same exact notebook and the same code ran without error. No idea why but seems there a problem somewhere.

1 Like

i had the same error… by reducing the size (in a hunch) of the aug_transforms, that error went away….(both on P5000)


There are 19 people on the same score as me in the Paddy competition now. Is the order arbitrary or in order of submission when scores are the same?

1 Like

The order looks random to me. You are doing a good job. :slight_smile:

1 Like

I always thought it’s ordered by submission time if score is same …

The time here could be of the last submission which not necessarily is the best one i think…
Edit: i meant that if you submit a solution kaggle records a time of it, if it’s your the best scoe, the score and hopefully position gets updated on leaderboard. However if score is lower than best, only submission time is recorded.

1 Like

Ok, I had the same on my local computer. Could you check my full trace and compare it to yours?
Mine disappears when I run the same line again. It is weird.

Is there are generally accepted description for the type of multiple target model you started to build today Jeremy? It’s not multi-modal because we are only using images for training. Would this be correctly referred to as a multi-head model?

1 Like

I’m not sure I’ve really heard them being given a name before! That description you linked to seems close enough, however.

Although as you’ll see tomorrow I’ve come up with a much simpler approach…


How can we get the sweep ids from the wandb experimentation? I’m running to generate the data for the as Jeremy did. I don’t know how to construct the sweep id for creating dictionaries for the dataframe? Any hint? Thanks.

I really like using the lsof command for this category of issues.

lsof -i :4321 # gets the process that "opened" the port number 4321

Pretty handy with hunting down (process that) occupied local ports, locked/opened files etc., lsof (list open files) does a lot more & is included with most distros.


Adding the blogpost mentioned by Jeremy about the experiments on the topic 31:30 - Brute force hyperparameter optimisation vs human approach.


Ohh, I can help you with that on the live session on Friday 24. I updated the Readme to make it more clear.

That would be great. Would you please share the link to the live session on 24th. Thanks.

1 Like

It is 1am UK time but I’ll try my best. If the session is recorded that’d be helpful in case I can’t make it. Thanks.

10am CET should be 9am UK time.

My bad. Thanks for the clarity. Then I can make it. See you in the meetup.

walkthru 11 detailed notes in questions

00:00 - Recap on Paddy Competition

04:30 - Tips on getting votes for Kaggle notebooks

07:30 - How to disconnect the other sessions in tmux?

When you have 3 running sessions on different machines, you can shift + d and select the one to disconnect

09:10 Welcoming a new comer

10:30 2 - Weights and Biases Sweep

What does WandB do?
What is sweep?
How to create a sweep?
What does tell WandB run and what info to extract?
What does the fastai-WandB integration do for you?

14:40 - WandB can track GPU metrics

16:40 1 - What can fastgpu do for you? What Jeremy’s plan for fastgpu in the future?

#question Should we use fastgpu with paperspace for automating multiple notebooks?

What’s Jeremy’s opinion on WandB?

18:05 What does sweep.yaml file look like for doing hyperparameter optimization search

20:00 - How to access all your git repo’s information?

your-git-repo# cat .git/config

24:49 - How to extract the info we need from the experiment results from WandB for further analysis

Model Analysis Repo

25:05 - Why does Jeremy have to rerun the sweep experiment?

cropping is in fact usually better than squish for Resize, not the other way round.

26:00 - Why using WandB API with Jupyter notebook is so much better for Jeremy?

Does the parallel coordinates chart on wandb actually worth our attention for examining the experiment results? No, unfortunately

31:30 - Why Jeremy’s approach to hyperparameter optimisation is more practical and benefitical than brute force

Who taught WandB hyperparameter optimization?

Did Jeremy used hyperparameter optimization method once and just for finding the best value of dropout?

32:33 What’s Jeremy’s human driven approach to hyperparameter?

Why you don’t have to do a grid search for hyperparameters?

What does Jeremy do to make the human driven approach efficient and effective?

How does Jeremy accumulate knowledge of deep learning through these experiments?

What’s the terrible drawback or downside of doing brute force hyperparameter optimizations?

34:51 Do many hyperparameter values Jeremy found through experiments applicable to different architectures/models/datasets?

Is there some exceptions? yes, tabular dataset

It’s crazy that no one have done serious experiments to figure out the best hyperparameters for vision problems in segmentation, bounding boxes, etc

37:30 - Why does Jeremy not using learn.lr_find any more?

39:39- How to find out where a jupyter notebook is running behind the scene?

ps waux | grep jupyter

42:00 - How to get program running in the Background in terminal?

ctrl + z to stop running a program in terminal
bg 1 or 2 or ... to continue running the program in the background
jupyter notebook --no-browser & to run in the background
fg and ctrl + c to kill a program

How to search

46:20 - How to iterate and improve by duplicating notebooks with different methods or modified models

Jupyter: output toggle feature

49:51 Why does Jeremy focus on the final error-rate differ from tta result for each model?

tta is what Jeremy use in the end, final error-rate of training is for reference I think.

50:50 - How to build models on vit_small_patch16_224 pretrained model

#question Why Jeremy chose 3 models to build for each pre-trained model? squish, cropping and padding

52:05 How to evaluate models built on swinv2_base_window12_192_22k

53:36 - How to build models on large pre-trained models (paddy large notebook) from paddy small notebook

Why to remove the seed=42 and why it is fine?

54:52 - What models did Jeremy use the final submission up to now?

55:50 - How important/better is Model stacking/ensembling than individual outstanding models?

57:00 - Keeping track of submission notebooks

How to become a good deep learning practitioner