Walkthru 10 detailed note in the form of questions
The best vision models for fine-tuning notebook
00:00 - Questions on tabular data and the fastbook has the answer
Why paddy dataset is interesting
07:06
paddy dataset is similar to ImageNet in terms of shape and size but have no paddy labels
What kind of dataset can do well on fine-tuning a pre-trained model?
08:37
Is the dataset (e.g., PETS dataset) very similar to the pre-trained model’s dataset (e.g., ImageNet)?
The more similar, the better the dataset can fine tune the model by making use much of the pretrained weights
How large is the dataset, especially when the dataset is not similar e.g., the planet dataset to the Imagenet?
When datasets are very different, most of weights from the pretrained model will be useless, so the larger of the dataset, the more weights can be trained, the better the model can learn
Experiment to find out the best model for fine-tuning using similar and large dataset vs dissimilar and small dataset
10:44
If we can find the best model from PETS dataset and Planet dataset, then it may be applied to other similar senarios
Jeremy walks us through how he and Thomas Capelle designed their experiments
11:55
Explore the fine_tune.py
from fastai_timm repo
Explore the sweep_planets_lr.yaml
from the repo
Weights and biases API can enable us to see our experiment results inside Jupyter notebook
What does Jeremy use gist for?
14:10
How Jeremy use WandB API to use their experiment results inside Jupyter notebook
15:00
How to turn a dataframe into a string
17:04
StringIO
is the key to make sure pd.to_csv
to save dataframe into a string rather than a file
How Jeremy create a gist?
17:50
import ghapi.core as gh
g = gh.GhAPI()
gist = g.create_gist('description of the gist', content_as_string, filename='', public=True)
gist.html_url
What does Jeremy use gist for here and generally?
How to do score models with data from the gist url
19:45
How to calculate the score
for all models based on their error_rate
, fit_time
, and GPU_mem
?
How does Jeremy come up with the score
design?
How to sort all the models based on their score
and display the top 15 models?
#question How much does fit_time
and GPU_mem
matter more and when?
How to compare models (on error_rate and fit_time) by families
23:02
How to find the best error_rate models who have better than average gpu mem and fit_time
24:13
What is gpu mem and when does it matter?
Which model family is very good at fine-tuning for planet dataset
25:44
Why the best model families don’t improve accurate when model size get larger?
27:08
Because small datasets won’t help large models to learn much.
Which model/model families are best to fine-tune on non-ImageNet like dataset such as planets dataset?
27:37
What is the fastai way’s of doing parameter sweeping vs the google way to find out general insights or rules
28:39
Can we apply the findings (models, model families, good parameters) to all computer vision classifications? Yes
How many GPUs and for how long does Jeremy run the experiment? 3 GPUs for 12 hours
Why we don’t need to try every possibility on every level?
How did Jeremy pick the range of values of parameters for experiments?
How to find pre-trained models for other datasets?
32:35
to google
model zoo
paper with code
hugging face
Why Jeremy does not prefer to publish to academic journals
34:34
Jeremy wants to share knowledge more freely and openly whereas academic journals generally make it difficult.
How Jeremy try out small models based on the sweep experiment findings
37:26
How to setup the code for efficiently build and compare different models?
39:43
convnext small in22k
How the first two models differ in the comparison? with or without squish
, but they both use square images for augmentation
`Resize(480, method=‘squish’), batch=aug_transforms(size=224, min_scale=0.75)
What does Resize((480, 640))
do? to reverse only 3-4 images which have opposite aspect ratio, and do nothing to the rest of images
40:39
Why Jeremy tried a rectangular size (224, 288) for model 3 and (240, 320) for model 4 when doing augmentation images? and why model 3 is expected to perform better than model 4?
41:46
How to find out whether the original image aspect ratio is (480, 640) or (640, 480)?
44:07
vit_small_patch16_224 model
Why rectangular approaches won’t be possible for this vit_small model?
How the 5th and 6th models differ? with or without squish
, and as Jeremy said generally squish
version works better
`Resize(480, method=‘squish’), batch=aug_transforms(size=224, min_scale=0.75)
#question why still use Resize(480)
rather than Resize(640)
?
How is the 7th model built based on Resize(640, method=ResizeMethod.Pad, pad_mode=PadMode.Zeros)
?
What is the logic behind it?
How to build models on swinv2_base_window12_192_22k?
44:57
The first time error rate down to <2%
all models must use augmentation image size 192
and Resize(480)
build two models with or without squish
using `Resize(480, method=‘squish’), batch=aug_transforms(size=224, min_scale=0.75)
build a third models on Resize(640, method=ResizeMethod.Pad, pad_mode=PadMode.Zeros)
Jeremy found it very interesting that this swin large and slow model family works better than previous small and fast model families on even smaller resized images.
How to build models on swin_small_patch4_window7_224?
45:50
The first two are with or without squish
Resize(480, method='squish'), batch=aug_transforms(size=224, min_scale=0.75)
The third one is on Resize(640, method=ResizeMethod.Pad, pad_mode=PadMode.Zeros)
Build models on more accurate but slow large pre-trained models
46:06
convnext_large_in22k
Why to use a different seed number or different set of batches for doing experiments in this group of models?
How to avoid out of GPU mem problem when running large models
47:03
How does Gradient accumulation prevent out of memory problem?
How does batch size work behind the scene? Why it is necessary?
49:54
Why should not apply majority vote but use averaging probabilities
52:50
How Jeremy set up to do ensemble using those models above?
53:15
How much time Jeremy spent on all these work
53:56