Live coding 9

This topic is for discussion of the 9th live coding session

<<< Walkthru 8Walkthru 10 >>>

Links from the walk-thru

What was covered

  • (please contribute here)

Video timeline - thank you @Mattr

01:00 - Installing timm persistently in Paperspace
04:00 - Fixing broken symlink
06:30 - Navigating around files in vim
16:40 - Improving a model for a kaggle competition
24:00 - Saving a trained model
34:30 - Test time augmentation
39:00 - Prepare file for kaggle submission
45:00 - Compare new predictions with previous
46:00 - Submit new predictions
49:00 - How to create an ensemble model
54:30 - Change probability of affine transforms
57:00 - Discussion about improving accuracy


Hello, a few silly questions that came up while watching this walkthru (which was incredible-- so thank you).

  1. Is there a way to choose a number of epochs such that it stops training once the validation loss plateaus?

  2. Is there a way to plot the validation & training loss?

  3. Is there a rule of thumb for choosing the correct batch size, given the amount of data you might have? Say if you’re really low on data, would you still choose 128/256?


  1. Knowing the number of epochs upfront/before training is difficult to ascertain, but there is a callback in Fast.AI that can monitor your validation loss and end training early based off a plateau. You can customize it so that accepts X number of epochs of training without seeing improvement in the value you are monitoring: Tracking callbacks | fastai

  2. Yes! This is available via a method in the recorder class of our learner object: Learner, Metrics, and Basic Callbacks | fastai

  3. This is a bit dependent on the data/problem itself. While prototyping and making sense/trying out things for the problem at hand, often times I choose the biggest batch size I can (fit on the GPU available) – a larger batch size will allow you to train each epoch faster and therefore allows you to try out more ideas (you will most likely get worse results vs a smaller batch size [see below]). Once I know what might work better, I can choose a smaller batch-size and squeeze out the relevant performance gains… I guess I could also just use a subset of the data and use the smaller batch-size – now that I think about it :slight_smile:

There was an interesting conversation in regards to this (question 3) on the discord channel, here’s some things mentioned there that might help with making sense of how to approach this topic:


Oh wow, thanks for your thorough reply. I have zero experience will callback (and so I’m worried to attempt using ones…) but perhaps once we’re done with the class I’ll look for a few exercises. I’ll definitely implement the recorder method for all of my future training. I sure do enjoy the visualization. Seeing how true pros train their models really helps out the community.

I tried bigger architecture ‘convnext_large_in22k’ with everything same and landed right on 98% accuracy on kaggle leaderboard. Check the training results below:
Screenshot from 2022-06-10 18-37-16

And entering the 98% accuracy :slight_smile:

It is really exciting. Difference is not huge but it takes a while to train.


I’ve a question: is there a way to computer mean and standard deviation for the entire dataset in fastai? By default, fastai uses imagenet_stats for normalisation. Wouldn’t it be nicer to personalise these stats for your dataset? These stats may not vary much if the problem at hand is somewhat similar to the image net.

1 Like

No, because that would break the pretrained model.

However, fastai’s Normalize does have the option to calculate from a random sample of data. (There’s no reason to use the whole dataset).

Well that went well


Great job :clap:

1 Like

Thanks @jeremy for the explanation. That is a good point. I never thought about it breaking pretrained model. I will check the option. Cool.

Wow, incredible. Great job.

FYI Bilal I prefer to avoid getting at-mentioned here, except for stuff that needs immediate attention from an admin. No big deal, but just figured I’d mention it. Helps me focus on the stuff that’s urgent.

I’ll share the method in our next walkthru.


No more mentions until necessary. Thanks.

1 Like

As a software engineer I like recipes to code
Looks to me that in order to do well on Data Science you need to data “fanatic”
I still see data like a Blob of data
After this last walkthrough I realized that I need to treat data in a fanatical way.

I’m using the walkthroughs to understand the way that you think and the way that you resolve issues

I don’t think I’m alone here :grinning:

If possible I would like to see more steps that we need to follow to achieve the success needed.


Video links for walk-thru 9

I tried a convnext_base_in22k with no squish and managed to get over 98% accuracy which is pleasing. When I tried a larger model still I started getting out of memory errors.

01:00 - Installing timm persistently in Paperspace
04:00 - Fixing broken symlink
06:30 - Navigating around files in vim
16:40 - Improving a model for a kaggle competition
24:00 - Saving a trained model
34:30 - Test time augmentation
39:00 - Prepare file for kaggle submission
45:00 - Compare new predictions with previous
46:00 - Submit new predictions
49:00 - How to create an ensemble model
54:30 - Change probability of affine transforms
57:00 - Discussion about improving accuracy


We’ll learn to fix those in the next session too :slight_smile:


So, a couple of questions around presizing & the size fed into the model.

In this walkthrough, towards the end you try changing the sizes that gets fed into model(usually 224) to other values. In one case, you also make it rectangular instead of the usual square.

From what I understand, in the aug_transforms (on cpu per item), we “presize” it a bit larger than the actual dimensions we plan to feed into the model. This is done so that we can zoom in and randomly pick different sections(if using RandomResizeCrop) during the batch transforms augmentation(whole batch on the gpu). We also avoid the possible problems with edge/padding artifacts when performing rotations, skews etc.

Assuming the above understanding is correct, I have a couple of questions in my mind :

  • I understand that the final size does not always have to 224,224 (or the model_train_size? in the kaggle best models notebook). But, I guess it’s a good place to start with, or should we try to start with something even smaller ?
  • How should one think about “when to start using non standard sizes”, for eg. like the rectangular size you used ?
  • In general, how should one think about what sizes to start with and when/how to scale up ?
  • In the same context, can you talk a bit more about the train_image_size and infer_image_size metadata on the “Which image models are best?” kaggle notebook ?

I’m not sure I can join the live session, but it’s fine if you cover these questions there. I’ll catch up on the recording, and maybe it’s useful to more people recorded as well.


walkthru 9: a note in the form of questions


Installing timm persistently in Paperspace


What’s inside Jeremy’s .bash.local at the beginning of walkthru 9

#!usr/bin/env bash

export PATH=~/conda/bin:$PATH

How and where to create alias for pip install -U --user

vim /storage/.bash.local
alias pi="pip install -U --user"
source /storage/.bash.local

How to print out the content of an alias?

type pi

How to install TIMM on paperspace?

pi timm

How to check whether TIMM is installed into your home directory’s .local?
ls ~/.local/python3.7/site-packages/

How to create a file in terminal without any editting?

touch /storage/.gitconfig

How to check the content of a file in terminal without entering the file?

cat /storage/.gitconfig

How to apply /storage/ after some changes?

source /storage/

What’s wrong with the symlink .gitconfig -> /storage/.gitconfig/?


you accidently symlink a folder to a file

How to run ipython with a shortcut?

just type !ipy as long as you typed it before

How to run ctags properly for fastai/fastai?

fastai/fastai# ctags -R .

vim tricks moving back and forth in previous locations

ctrl + o move to previous places
ctrl + i move to next places

vim: jump to specific place within a line using f or F

f": jump to the next " in the line after the cursor
fs: jump to the next s in the line after the cursor
3fn: jump to the third n after the cursor in the line
shift + f + s: jump to the first s before the cursor in the line

vim: delete from cursor to the next or previous symbol or character in a line

dFs: delete from cursor to the previous s
df": delete from cursor to the next "
/" + enter + .: search and find the next " and redo df"

vim: moving cursor between two (, ) or [, ] or {, } and delete everything from cursor to the end bracket using %

%: toggle positions between starting and ending bracket
d%: delete from cursor to the end bracket
df(: delete from cursor to the first (

vim: delete everything within a bracket with i


di(: delete everything within () in which cursor is also located

di{: delete everything within {} in which cursor is also located

vim: delete content within () and enter edit mode to make some changes and apply it all (including changes) to next place using ci( and .

ci(: delete everything between () where cursor is located and enter the edit mode to make changes
move to a different position within a () and press .: to redo the action above

Why and how to install a stable and updated dev version of TIMM


What is pre-release version?

pi "timm>=0.6.2dev"

How to ensure we have the latest version?

How to avoid the problem of not finding timm after importing it?


use pip install --user and then symlink it from /storage/.local

Improve the model by more epochs with augmentations

a notebook which trained longer with augmentations beat the original model we have

How many epochs may get your model overfitting? and why? and how to avoid it


The details is discussed in fastbook

Explore aug_transforms - improve your model by augementations


Read its docs

Read what is returned

How to display the transformed pictures with show_batch?
dls.train.show_batch(max_n=4, nrows=1, unique=True)?

This is how aug_transforms ensure model not see the same picture 10 times when training for 10 epochs

Improve your model with a better pre-trained model


Improve your model with a better learning rate


Why the default learning rate of fastai is more conservative/lower than we need? to always be able to train

What are the downsides of using lower learning rate


weights have to move in less distance with lower learning rate

higher learning rates can help jump further to explore more or better weights spaces

How to find a more promising learning rate?


How to read and find a more aggressive/appropriate learning rate from lr_find to

How to export and save your trained model?


Explore learn.export

Where does learn.export save model weights by default?

How do we change the directory for saving models?
learn.path = Path('an-absolute-directory-such-as-paddy-folder-in-home-directory')


How does differ from learn.export?

What does learn contain in itself? model, transformations, dataloaders

When to use over learn.export?


when need to save model at each epoch, and load the updated model to create a new learner

Which data format is used for saving models? pkl

Explore min_scale=0.75 of aug_transforms and method="squish" of resize: improve your model with augmentations?


Why Jeremy choose 12 epochs for training?


Why Jeremy choose 480 for resize and 224 for aug_transforms


details on resize can be found in fastbook

Why Jeremy pick the model convnext_small_in22k with emphasis on in22k?


What does in22k mean?

How is this more useful to our purpose? or why Jeremy adviced to always pick in22k models?

Will learn.path for saving models take both string and Path in the future?


Test-time augmentation and what problems does it solve


What is the problem when we don’t use squish? only take images from the center, not the whole image

Another problem is that we don’t apply any of those augmentations to validation set. (but why this is a problem? #question )

Test-time augmentation is particularly useful when no squish is used, and still useful even with squish.

What is Test-time augmentation? It is to create 4 augmented images for each test image and take average of predictions of them.

How to calculate/replicate the model’s latest error-rate from learn.get_preds?


Explore Test-time augmentation (tta)


Read the doc

How should we learn to read the source code of learn.tta guided by Jeremy?

Let’s run learn.tta on the validation set to see whether error-rate is even better

How many times in total does learn.tta make predictions? 5 = 4 augmented images + 1 original image

How do Jeremy generally use learn.tta? He will not use squish and use learn.tta(..., use_max=True) 39:21

How to apply learn.tta to test set and work around without with_decoded=True in learn.tta?


How to find out the idx of the maximum prob of each prediction with argmax?

How to create idxs into a pd.Series and vocab into a dictionary and map idxs with dictionary to get the result?

How to create dictionary with a tuple?


Good things to do before submission


Why to compare the new results with previous results? to make sure we are not totally breaking things.

Also to document the codes of previous version just in case we may need it later

Please make detailed submit comment to specify the changes of this updated model

Check where we are in the leader board

How to create the third model without squish but with learn.tta(..., use_max=True)?


How to create the fourth model using rectangule images in augmentation?


How to the original image’s aspect ratio but shrink the size?

When to use rectangule rather than square images in augmentation?

How to check the augmented images after changing the augmentation settings?

Why and how to adjust (affine_transform) p_affine?


What does affine transformation do? zoom in, rotate, etc

if the augmented images are still in good resolution, then we should not do p_affine that often, so reduce its value from 0.75 to 0.5

Save your ensemble/multiple models in /notebooks/s on paperspace


Why or when to focus on augmentation vs different models?


Please feel free to join walkthru


jupyter: How to merge jupyter cells?

shift + m


I’m trying to run…

on Paperspace, but the resize_images() behaviour is inconsistent between Paperspace and Kaggle.

Running the following code…

try: import fastkaggle
except ModuleNotFoundError:
    !pip install -q --user fastkaggle
from fastkaggle import *

comp = 'paddy-disease-classification'
path = setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')
from import *

trn_path = Path('sml')
resize_images(path/'train_images', dest=trn_path, max_size=256, recurse=True)
!ls $trn_path

…on Kaggle, the folder hierarchy is maintained:

bacterial_leaf_blight blast downy_mildew tungro
bacterial_leaf_streak brown_spot hispa
bacterial_panicle_blight dead_heart normal

…on Paperspace, the folder hierarchy is flattened:

100001.jpg 101736.jpg 103471.jpg 105206.jpg 106941.jpg 108676.jpg
100002.jpg 101737.jpg 103472.jpg 105207.jpg 106942.jpg 108677.jpg
100003.jpg 101738.jpg 103473.jpg 105208.jpg 106943.jpg 108678.jpg

This discrepancy was discovered while investigating the following strange stats on Paperspace…

Can someone confirm this behaviour?

p.s. Hopefully not important, but btw pip command had --user added, which was different to Jeremy’s original.