Great job
Thanks @jeremy for the explanation. That is a good point. I never thought about it breaking pretrained model. I will check the option. Cool.
Wow, incredible. Great job.
FYI Bilal I prefer to avoid getting at-mentioned here, except for stuff that needs immediate attention from an admin. No big deal, but just figured I’d mention it. Helps me focus on the stuff that’s urgent.
I’ll share the method in our next walkthru.
No more mentions until necessary. Thanks.
As a software engineer I like recipes to code
Looks to me that in order to do well on Data Science you need to data “fanatic”
I still see data like a Blob of data
After this last walkthrough I realized that I need to treat data in a fanatical way.
I’m using the walkthroughs to understand the way that you think and the way that you resolve issues
I don’t think I’m alone here
If possible I would like to see more steps that we need to follow to achieve the success needed.
Video links for walk-thru 9
I tried a convnext_base_in22k with no squish and managed to get over 98% accuracy which is pleasing. When I tried a larger model still I started getting out of memory errors.
01:00 - Installing timm persistently in Paperspace
04:00 - Fixing broken symlink
06:30 - Navigating around files in vim
16:40 - Improving a model for a kaggle competition
24:00 - Saving a trained model
34:30 - Test time augmentation
39:00 - Prepare file for kaggle submission
45:00 - Compare new predictions with previous
46:00 - Submit new predictions
49:00 - How to create an ensemble model
54:30 - Change probability of affine transforms
57:00 - Discussion about improving accuracy
We’ll learn to fix those in the next session too
So, a couple of questions around presizing & the size fed into the model.
In this walkthrough, towards the end you try changing the sizes that gets fed into model(usually 224) to other values. In one case, you also make it rectangular instead of the usual square.
From what I understand, in the aug_transforms (on cpu per item), we “presize” it a bit larger than the actual dimensions we plan to feed into the model. This is done so that we can zoom in and randomly pick different sections(if using RandomResizeCrop) during the batch transforms augmentation(whole batch on the gpu). We also avoid the possible problems with edge/padding artifacts when performing rotations, skews etc.
Assuming the above understanding is correct, I have a couple of questions in my mind :
- I understand that the final size does not always have to 224,224 (or the
model_train_size
? in the kaggle best models notebook). But, I guess it’s a good place to start with, or should we try to start with something even smaller ? - How should one think about “when to start using non standard sizes”, for eg. like the rectangular size you used ?
- In general, how should one think about what sizes to start with and when/how to scale up ?
- In the same context, can you talk a bit more about the
train_image_size
andinfer_image_size
metadata on the “Which image models are best?” kaggle notebook ?
I’m not sure I can join the live session, but it’s fine if you cover these questions there. I’ll catch up on the recording, and maybe it’s useful to more people recorded as well.
walkthru 9: a note in the form of questions
Installing timm persistently in Paperspace
What’s inside Jeremy’s .bash.local
at the beginning of walkthru 9
#!usr/bin/env bash
export PATH=~/conda/bin:$PATH
How and where to create alias for pip install -U --user
vim /storage/.bash.local
alias pi="pip install -U --user"
source /storage/.bash.local
How to print out the content of an alias?
type pi
How to install TIMM on paperspace?
pi timm
How to check whether TIMM is installed into your home directory’s .local
?
ls ~/.local/python3.7/site-packages/
How to create a file in terminal without any editting?
touch /storage/.gitconfig
How to check the content of a file in terminal without entering the file?
cat /storage/.gitconfig
How to apply /storage/pre-run.sh
after some changes?
source /storage/pre-run.sh
What’s wrong with the symlink .gitconfig -> /storage/.gitconfig/
?
you accidently symlink a folder to a file
How to run ipython
with a shortcut?
just type !ipy
as long as you typed it before
How to run ctags
properly for fastai/fastai
?
fastai/fastai# ctags -R .
vim tricks moving back and forth in previous locations
08:39
ctrl + o
move to previous places
ctrl + i
move to next places
vim: jump to specific place within a line using f
or F
10:23
f"
: jump to the next "
in the line after the cursor
fs
: jump to the next s
in the line after the cursor
3fn
: jump to the third n
after the cursor in the line
shift + f + s
: jump to the first s
before the cursor in the line
vim: delete from cursor to the next or previous symbol or character in a line
dFs
: delete from cursor to the previous s
df"
: delete from cursor to the next "
/" + enter + .
: search and find the next "
and redo df"
vim: moving cursor between two (, ) or [, ] or {, } and delete everything from cursor to the end bracket using %
11:24
%
: toggle positions between starting and ending bracket
d%
: delete from cursor to the end bracket
df(
: delete from cursor to the first (
vim: delete everything within a bracket with i
di(
: delete everything within ()
in which cursor is also located
di{
: delete everything within {}
in which cursor is also located
vim: delete content within () and enter edit mode to make some changes and apply it all (including changes) to next place using ci(
and .
12:59
ci(
: delete everything between () where cursor is located and enter the edit mode to make changes
move to a different position within a () and press .
: to redo the action above
Why and how to install a stable and updated dev version of TIMM
What is pre-release version?
pi "timm>=0.6.2dev"
How to ensure we have the latest version?
“timm>=0.6.2dev”
How to avoid the problem of not finding timm
after importing it?
use pip install --user
and then symlink it from /storage/.local
Improve the model by more epochs with augmentations
a notebook which trained longer with augmentations beat the original model we have
How many epochs may get your model overfitting? and why? and how to avoid it
The details is discussed in fastbook
Explore aug_transforms - improve your model by augementations
Read its docs
Read what is returned
How to display the transformed pictures with show_batch
?
dls.train.show_batch(max_n=4, nrows=1, unique=True)
?
This is how aug_transforms
ensure model not see the same picture 10 times when training for 10 epochs
Improve your model with a better pre-trained model
Improve your model with a better learning rate
Why the default learning rate of fastai is more conservative/lower than we need? to always be able to train
What are the downsides of using lower learning rate
weights have to move in less distance with lower learning rate
higher learning rates can help jump further to explore more or better weights spaces
How to find a more promising learning rate?
How to read and find a more aggressive/appropriate learning rate from lr_find
to
How to export and save your trained model?
Explore learn.export
Where does learn.export
save model weights by default?
How do we change the directory for saving models?
learn.path = Path('an-absolute-directory-such-as-paddy-folder-in-home-directory')
Explore learn.save
How does learn.save
differ from learn.export
?
What does learn
contain in itself? model, transformations, dataloaders
When to use learn.save
over learn.export
?
when need to save model at each epoch, and load the updated model to create a new learner
Which data format is used for saving models? pkl
Explore min_scale=0.75
of aug_transforms
and method="squish"
of resize
: improve your model with augmentations?
Why Jeremy choose 12 epochs for training?
Why Jeremy choose 480
for resize
and 224
for aug_transforms
details on resize
can be found in fastbook
Why Jeremy pick the model convnext_small_in22k
with emphasis on in22k
?
What does in22k
mean?
How is this more useful to our purpose? or why Jeremy adviced to always pick in22k
models?
Will learn.path
for saving models take both string and Path in the future?
Test-time augmentation and what problems does it solve
What is the problem when we don’t use squish
? only take images from the center, not the whole image
Another problem is that we don’t apply any of those augmentations to validation set. (but why this is a problem? #question )
Test-time augmentation is particularly useful when no squish
is used, and still useful even with squish
.
What is Test-time augmentation? It is to create 4 augmented images for each test image and take average of predictions of them.
How to calculate/replicate the model’s latest error-rate from learn.get_preds
?
Explore Test-time augmentation (tta)
Read the doc
How should we learn to read the source code of learn.tta
guided by Jeremy?
Let’s run learn.tta
on the validation set to see whether error-rate is even better
How many times in total does learn.tta
make predictions? 5 = 4 augmented images + 1 original image
How do Jeremy generally use learn.tta
? He will not use squish
and use learn.tta(..., use_max=True)
39:21
How to apply learn.tta
to test set and work around without with_decoded=True
in learn.tta
?
How to find out the idx of the maximum prob of each prediction with argmax
?
How to create idxs into a pd.Series and vocab into a dictionary and map idxs with dictionary to get the result?
How to create dictionary with a tuple?
Good things to do before submission
Why to compare the new results with previous results? to make sure we are not totally breaking things.
Also to document the codes of previous version just in case we may need it later
Please make detailed submit comment to specify the changes of this updated model
Check where we are in the leader board
How to create the third model without squish
but with learn.tta(..., use_max=True)
?
How to create the fourth model using rectangule images in augmentation?
How to the original image’s aspect ratio but shrink the size?
When to use rectangule rather than square images in augmentation?
How to check the augmented images after changing the augmentation settings?
Why and how to adjust (affine_transform) p_affine
?
What does affine transformation do? zoom in, rotate, etc
if the augmented images are still in good resolution, then we should not do p_affine
that often, so reduce its value from 0.75
to 0.5
Save your ensemble/multiple models in /notebooks/
s on paperspace
Why or when to focus on augmentation vs different models?
Please feel free to join walkthru
jupyter: How to merge jupyter cells?
shift + m
I’m trying to run…
on Paperspace, but the resize_images() behaviour is inconsistent between Paperspace and Kaggle.
Running the following code…
try: import fastkaggle
except ModuleNotFoundError:
!pip install -q --user fastkaggle
from fastkaggle import *
comp = 'paddy-disease-classification'
path = setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')
from fastai.vision.all import *
trn_path = Path('sml')
resize_images(path/'train_images', dest=trn_path, max_size=256, recurse=True)
!ls $trn_path
…on Kaggle, the folder hierarchy is maintained:
bacterial_leaf_blight blast downy_mildew tungro
bacterial_leaf_streak brown_spot hispa
bacterial_panicle_blight dead_heart normal
…on Paperspace, the folder hierarchy is flattened:
100001.jpg 101736.jpg 103471.jpg 105206.jpg 106941.jpg 108676.jpg
100002.jpg 101737.jpg 103472.jpg 105207.jpg 106942.jpg 108677.jpg
100003.jpg 101738.jpg 103473.jpg 105208.jpg 106943.jpg 108678.jpg
This discrepancy was discovered while investigating the following strange stats on Paperspace…
Can someone confirm this behaviour?
p.s. Hopefully not important, but btw pip command had --user added, which was different to Jeremy’s original.
That means you don’t have the latest fastai on paperspace.
The last container paperspace pushed was about 2 months ago. Maybe I’m reading it wrong but looks like they’re cloning fastcore and fastai and doing a pip install in both directories. Not sure if that’s the canonical way to install fastai. At least that’s what I gleaned looked at their latest published tag for the fastai template env on dockerhub.
Ah. thx.
At the top of my scripts I watch the version with…
fastai.__version__
2.6.3
which was the latest last week, and then I blindly presumed…
setup_comp(comp, install='"fastcore>=1.4.5" "fastai>=2.7.1" "timm>=0.6.2.dev0"')
…was taking care of everything, without actually noticing the 2.7.1 that I’d skimmed over thinking it was another package.
2.7.3 is now installed and it works properly.
But, why are the pre-requisite installs only done for Kaggle?
def setup_comp(competition, install=''):
"Get a path to data for `competition`, downloading it if needed"
if iskaggle:
if install:
os.system(f'pip install -Uqq {install}')
return Path('../input')/competition
else:
path = Path(competition)
from kaggle import api
if not path.exists():
import zipfile
api.competition_download_cli(str(competition))
zipfile.ZipFile(f'{competition}.zip').extractall(str(competition))
return path
It also might help for users to be advised they need to restart their kernel when a version upgrade occurs. Maybe something roughly like…
def setup_comp(competition, install=''):
"Get a path to data for `competition`, downloading it if needed"
if install:
oldver = !pip show fastai | grep Version | sed 's/Version: //'
os.system(f'pip install -Uqq {install}')
newver = !pip show fastai | grep Version | sed 's/Version: //'
if oldver[0] != newver[0]: print ('Please restart kernel to ensure upgraded version loaded.')
if iskaggle:
path = Path('../input')/competition
else:
path = Path(competition)
from kaggle import api
if not path.exists():
import zipfile
api.competition_download_cli(str(competition))
zipfile.ZipFile(f'{competition}.zip').extractall(str(competition))
return path
Because on Kaggle you always have to install because there’s no persistent environment. Elsewhere you don’t need to update from pip every time you run a notebook.
I would find it useful, for the pre-requisite install to apply to non-kaggle platforms, so thought I’d have a go at a code contribution. I submitted an issue on github to discuss this:
I’m interested in getting broad feedback on whether this would be useful to others.
Hi, I have been working my way through these live coding sessions but notice that my training on Paperspace is taking much longer than in the video. When I run
vision_learner(dls, ‘convnext_small_in22k’, metrics=error_rate).to_fp16()
Each epoch takes around 4:40 minutes while it looks like each epoch takes around 35s in the video. According to nvidia-smi dmon GPU utilisation is around 100%. Storing training data in /root. I am running the notebook on a Free-P5000 instance in a paid account. Any suggestions on how I can speed up training would be much appreciated. Thanks!
hi @Tomasaki, You don’t mention what GPU Jeremy is using during the sessions (I’m not sure if that is ever visible) so I’d imagine Jeremy used a higher performing paid-instance, so as to not waste the time of those people participating in the session.
Thanks @bencoman for pointing that out to me. I was thinking of session 7 where he is running this instance type on Paperspace.