Live coding 9

Great job :clap:

1 Like

Thanks @jeremy for the explanation. That is a good point. I never thought about it breaking pretrained model. I will check the option. Cool.

Wow, incredible. Great job.

FYI Bilal I prefer to avoid getting at-mentioned here, except for stuff that needs immediate attention from an admin. No big deal, but just figured I’d mention it. Helps me focus on the stuff that’s urgent.

I’ll share the method in our next walkthru.


No more mentions until necessary. Thanks.

1 Like

As a software engineer I like recipes to code
Looks to me that in order to do well on Data Science you need to data “fanatic”
I still see data like a Blob of data
After this last walkthrough I realized that I need to treat data in a fanatical way.

I’m using the walkthroughs to understand the way that you think and the way that you resolve issues

I don’t think I’m alone here :grinning:

If possible I would like to see more steps that we need to follow to achieve the success needed.


Video links for walk-thru 9

I tried a convnext_base_in22k with no squish and managed to get over 98% accuracy which is pleasing. When I tried a larger model still I started getting out of memory errors.

01:00 - Installing timm persistently in Paperspace
04:00 - Fixing broken symlink
06:30 - Navigating around files in vim
16:40 - Improving a model for a kaggle competition
24:00 - Saving a trained model
34:30 - Test time augmentation
39:00 - Prepare file for kaggle submission
45:00 - Compare new predictions with previous
46:00 - Submit new predictions
49:00 - How to create an ensemble model
54:30 - Change probability of affine transforms
57:00 - Discussion about improving accuracy


We’ll learn to fix those in the next session too :slight_smile:


So, a couple of questions around presizing & the size fed into the model.

In this walkthrough, towards the end you try changing the sizes that gets fed into model(usually 224) to other values. In one case, you also make it rectangular instead of the usual square.

From what I understand, in the aug_transforms (on cpu per item), we “presize” it a bit larger than the actual dimensions we plan to feed into the model. This is done so that we can zoom in and randomly pick different sections(if using RandomResizeCrop) during the batch transforms augmentation(whole batch on the gpu). We also avoid the possible problems with edge/padding artifacts when performing rotations, skews etc.

Assuming the above understanding is correct, I have a couple of questions in my mind :

  • I understand that the final size does not always have to 224,224 (or the model_train_size? in the kaggle best models notebook). But, I guess it’s a good place to start with, or should we try to start with something even smaller ?
  • How should one think about “when to start using non standard sizes”, for eg. like the rectangular size you used ?
  • In general, how should one think about what sizes to start with and when/how to scale up ?
  • In the same context, can you talk a bit more about the train_image_size and infer_image_size metadata on the “Which image models are best?” kaggle notebook ?

I’m not sure I can join the live session, but it’s fine if you cover these questions there. I’ll catch up on the recording, and maybe it’s useful to more people recorded as well.


walkthru 9: a note in the form of questions


Installing timm persistently in Paperspace


What’s inside Jeremy’s .bash.local at the beginning of walkthru 9

#!usr/bin/env bash

export PATH=~/conda/bin:$PATH

How and where to create alias for pip install -U --user

vim /storage/.bash.local
alias pi="pip install -U --user"
source /storage/.bash.local

How to print out the content of an alias?

type pi

How to install TIMM on paperspace?

pi timm

How to check whether TIMM is installed into your home directory’s .local?
ls ~/.local/python3.7/site-packages/

How to create a file in terminal without any editting?

touch /storage/.gitconfig

How to check the content of a file in terminal without entering the file?

cat /storage/.gitconfig

How to apply /storage/ after some changes?

source /storage/

What’s wrong with the symlink .gitconfig -> /storage/.gitconfig/?


you accidently symlink a folder to a file

How to run ipython with a shortcut?

just type !ipy as long as you typed it before

How to run ctags properly for fastai/fastai?

fastai/fastai# ctags -R .

vim tricks moving back and forth in previous locations

ctrl + o move to previous places
ctrl + i move to next places

vim: jump to specific place within a line using f or F

f": jump to the next " in the line after the cursor
fs: jump to the next s in the line after the cursor
3fn: jump to the third n after the cursor in the line
shift + f + s: jump to the first s before the cursor in the line

vim: delete from cursor to the next or previous symbol or character in a line

dFs: delete from cursor to the previous s
df": delete from cursor to the next "
/" + enter + .: search and find the next " and redo df"

vim: moving cursor between two (, ) or [, ] or {, } and delete everything from cursor to the end bracket using %

%: toggle positions between starting and ending bracket
d%: delete from cursor to the end bracket
df(: delete from cursor to the first (

vim: delete everything within a bracket with i


di(: delete everything within () in which cursor is also located

di{: delete everything within {} in which cursor is also located

vim: delete content within () and enter edit mode to make some changes and apply it all (including changes) to next place using ci( and .

ci(: delete everything between () where cursor is located and enter the edit mode to make changes
move to a different position within a () and press .: to redo the action above

Why and how to install a stable and updated dev version of TIMM


What is pre-release version?

pi "timm>=0.6.2dev"

How to ensure we have the latest version?

How to avoid the problem of not finding timm after importing it?


use pip install --user and then symlink it from /storage/.local

Improve the model by more epochs with augmentations

a notebook which trained longer with augmentations beat the original model we have

How many epochs may get your model overfitting? and why? and how to avoid it


The details is discussed in fastbook

Explore aug_transforms - improve your model by augementations


Read its docs

Read what is returned

How to display the transformed pictures with show_batch?
dls.train.show_batch(max_n=4, nrows=1, unique=True)?

This is how aug_transforms ensure model not see the same picture 10 times when training for 10 epochs

Improve your model with a better pre-trained model


Improve your model with a better learning rate


Why the default learning rate of fastai is more conservative/lower than we need? to always be able to train

What are the downsides of using lower learning rate


weights have to move in less distance with lower learning rate

higher learning rates can help jump further to explore more or better weights spaces

How to find a more promising learning rate?


How to read and find a more aggressive/appropriate learning rate from lr_find to

How to export and save your trained model?


Explore learn.export

Where does learn.export save model weights by default?

How do we change the directory for saving models?
learn.path = Path('an-absolute-directory-such-as-paddy-folder-in-home-directory')


How does differ from learn.export?

What does learn contain in itself? model, transformations, dataloaders

When to use over learn.export?


when need to save model at each epoch, and load the updated model to create a new learner

Which data format is used for saving models? pkl

Explore min_scale=0.75 of aug_transforms and method="squish" of resize: improve your model with augmentations?


Why Jeremy choose 12 epochs for training?


Why Jeremy choose 480 for resize and 224 for aug_transforms


details on resize can be found in fastbook

Why Jeremy pick the model convnext_small_in22k with emphasis on in22k?


What does in22k mean?

How is this more useful to our purpose? or why Jeremy adviced to always pick in22k models?

Will learn.path for saving models take both string and Path in the future?


Test-time augmentation and what problems does it solve


What is the problem when we don’t use squish? only take images from the center, not the whole image

Another problem is that we don’t apply any of those augmentations to validation set. (but why this is a problem? #question )

Test-time augmentation is particularly useful when no squish is used, and still useful even with squish.

What is Test-time augmentation? It is to create 4 augmented images for each test image and take average of predictions of them.

How to calculate/replicate the model’s latest error-rate from learn.get_preds?


Explore Test-time augmentation (tta)


Read the doc

How should we learn to read the source code of learn.tta guided by Jeremy?

Let’s run learn.tta on the validation set to see whether error-rate is even better

How many times in total does learn.tta make predictions? 5 = 4 augmented images + 1 original image

How do Jeremy generally use learn.tta? He will not use squish and use learn.tta(..., use_max=True) 39:21

How to apply learn.tta to test set and work around without with_decoded=True in learn.tta?


How to find out the idx of the maximum prob of each prediction with argmax?

How to create idxs into a pd.Series and vocab into a dictionary and map idxs with dictionary to get the result?

How to create dictionary with a tuple?


Good things to do before submission


Why to compare the new results with previous results? to make sure we are not totally breaking things.

Also to document the codes of previous version just in case we may need it later

Please make detailed submit comment to specify the changes of this updated model

Check where we are in the leader board

How to create the third model without squish but with learn.tta(..., use_max=True)?


How to create the fourth model using rectangule images in augmentation?


How to the original image’s aspect ratio but shrink the size?

When to use rectangule rather than square images in augmentation?

How to check the augmented images after changing the augmentation settings?

Why and how to adjust (affine_transform) p_affine?


What does affine transformation do? zoom in, rotate, etc

if the augmented images are still in good resolution, then we should not do p_affine that often, so reduce its value from 0.75 to 0.5

Save your ensemble/multiple models in /notebooks/s on paperspace


Why or when to focus on augmentation vs different models?


Please feel free to join walkthru


jupyter: How to merge jupyter cells?

shift + m


I’m trying to run…

on Paperspace, but the resize_images() behaviour is inconsistent between Paperspace and Kaggle.

Running the following code…

try: import fastkaggle
except ModuleNotFoundError:
    !pip install -q --user fastkaggle
from fastkaggle import *

comp = 'paddy-disease-classification'
path = setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')
from import *

trn_path = Path('sml')
resize_images(path/'train_images', dest=trn_path, max_size=256, recurse=True)
!ls $trn_path

…on Kaggle, the folder hierarchy is maintained:

bacterial_leaf_blight blast downy_mildew tungro
bacterial_leaf_streak brown_spot hispa
bacterial_panicle_blight dead_heart normal

…on Paperspace, the folder hierarchy is flattened:

100001.jpg 101736.jpg 103471.jpg 105206.jpg 106941.jpg 108676.jpg
100002.jpg 101737.jpg 103472.jpg 105207.jpg 106942.jpg 108677.jpg
100003.jpg 101738.jpg 103473.jpg 105208.jpg 106943.jpg 108678.jpg

This discrepancy was discovered while investigating the following strange stats on Paperspace…

Can someone confirm this behaviour?

p.s. Hopefully not important, but btw pip command had --user added, which was different to Jeremy’s original.

That means you don’t have the latest fastai on paperspace.

1 Like

The last container paperspace pushed was about 2 months ago. Maybe I’m reading it wrong but looks like they’re cloning fastcore and fastai and doing a pip install in both directories. Not sure if that’s the canonical way to install fastai. At least that’s what I gleaned looked at their latest published tag for the fastai template env on dockerhub.

Ah. thx.
At the top of my scripts I watch the version with…



which was the latest last week, and then I blindly presumed…

setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')

…was taking care of everything, without actually noticing the 2.7.1 that I’d skimmed over thinking it was another package.

2.7.3 is now installed and it works properly.

But, why are the pre-requisite installs only done for Kaggle?

def setup_comp(competition, install=''):
    "Get a path to data for `competition`, downloading it if needed"
    if iskaggle:
        if install:
            os.system(f'pip install -Uqq {install}')
        return Path('../input')/competition
        path = Path(competition)
        from kaggle import api
        if not path.exists():
            import zipfile
        return path

It also might help for users to be advised they need to restart their kernel when a version upgrade occurs. Maybe something roughly like…

def setup_comp(competition, install=''):
   "Get a path to data for `competition`, downloading it if needed"
   if install:
       oldver = !pip show fastai | grep Version | sed 's/Version: //'
       os.system(f'pip install -Uqq {install}')
       newver = !pip show fastai | grep Version | sed 's/Version: //'
       if oldver[0] != newver[0]: print ('Please restart kernel to ensure upgraded version loaded.')

   if iskaggle:
       path = Path('../input')/competition
       path = Path(competition)
       from kaggle import api
       if not path.exists():
           import zipfile
   return path

Because on Kaggle you always have to install because there’s no persistent environment. Elsewhere you don’t need to update from pip every time you run a notebook.

1 Like

I would find it useful, for the pre-requisite install to apply to non-kaggle platforms, so thought I’d have a go at a code contribution. I submitted an issue on github to discuss this:

I’m interested in getting broad feedback on whether this would be useful to others.

1 Like

Hi, I have been working my way through these live coding sessions but notice that my training on Paperspace is taking much longer than in the video. When I run

vision_learner(dls, ‘convnext_small_in22k’, metrics=error_rate).to_fp16()

Each epoch takes around 4:40 minutes while it looks like each epoch takes around 35s in the video. According to nvidia-smi dmon GPU utilisation is around 100%. Storing training data in /root. I am running the notebook on a Free-P5000 instance in a paid account. Any suggestions on how I can speed up training would be much appreciated. Thanks!

hi @Tomasaki, You don’t mention what GPU Jeremy is using during the sessions (I’m not sure if that is ever visible) so I’d imagine Jeremy used a higher performing paid-instance, so as to not waste the time of those people participating in the session.

1 Like

Thanks @bencoman for pointing that out to me. I was thinking of session 7 where he is running this instance type on Paperspace.