Live coding 10

RogerS49 · June 15, 2022, 6:37am

Not sure how you came to this state but when running programs in the background from a terminal you can do this

program_to_run & 2>&1 > s.txt

This redirects the error output 2 = stderr coupled with the standard output 1 = stdout to a text file that can be viewed afterward via the usual processes

jeremy · June 15, 2022, 9:46am

I don’t think that fixes Radek’s issue, since his messages are being generated by ssh trying to create a tunnel AFAICT.

radek · June 15, 2022, 11:09am

Yes, yes, yes, I think that is what is happening I wonder if others also suffer from this? I bet they do!

Thank you for your answer @RogerS49 nonetheless, that stderr redirection is something I know exists, but never got the finer points on how it works! Appreciate you sharing your thoughts!

RogerS49 · June 15, 2022, 2:30pm

In the first 6 minutes of this walk thru video a question was posed regarding Conditional Probabilities.
I would like to suggest to Daniel a python Probabilistic programming languages (PPLs) package named pyro-ppl.
This package was originated by UBER AI to determine routes etc in setting up there business.
It has since been open sourced and taken on as a Linux Project and is currently updated.
The package extends and builds on python and pytorch.distributions and is also influenced by the Edward python package. With this package you can build other distributions based on your priors and likelihoods.
It works very similar to TensorFlow Probability but possibly easier to work with.
Hope this information is of some use.

Pyro Documentation

marii · June 17, 2022, 10:55pm

If we use batchnorm I don’t think gradient accumulation will be mathematically identical. Though it still works fairly well so not too much of a problem. I ran into this previously when testing that fp16+Grad accum was working correctly.It is mathematically equivalent with layernorm/instance norm.

jeremy · June 18, 2022, 12:20am

Ah yes, true.

Mark_F · June 18, 2022, 3:21am

Can some explain test-time augmentation to me (Learner.tta()).

Is the purpose to effectively increase the size of the validation set by using augmentation, something vaguely akin to k-fold cross validation (but keeping test and validation data separate)?

It outputs a 2-tuple of ([list of tuples of probs], [list of classes]). When it calculates the weighted average, is it of the probabilities, then using the highest mean prob as the prediction? And is the list of classes the labels (as opposed to the predicted class)?

I’m trying to figure out how it feeds into error_rate().

jeremy · June 18, 2022, 3:42am

Start by reading about it in the book, then do some experiments in a notebook, and tell us what you find out – if you have any questions along the way, let us know!

(Yes, I could just tell you directly, but you’ll learn way more if you experiment yourself… )

brismith · June 18, 2022, 6:52pm

This walkthrough was so useful. To make sure I was understanding it I re-created, but due to issues with kernels dying in my wsl I ended up running this on my Apple M1 (cpu). So used tiny and just 3 epochs, and 3 runs across images 32,64 and 128 - then averaged them (weighting the larger images) and ended up above 94% - which surprised me. That would have been about 120 on the leaderboard :). My best is at 50 right now - time to try paperspace again (then might take a look at the Metal options with Pytorch if I get brave).

zymoide1 · June 19, 2022, 5:01pm

Perhaps I’m mistaken, but in walkthru 13, Jeremey mentions that we can indicate the number of independent inputs via the ImageBlock function. I will try this out, but I assume we could add the variety as input and change the n_inp to 2. Thanks for running the code without explicitly stating n_inp = 1 so we could gain a deeper understanding of the DataBlock function.

brismith · June 20, 2022, 9:24pm

In case others run into this - I was getting an error on Paperspace - not a CUDA memory issue but “Could not do one pass in your dataloader, there is something wrong in it. Please see the stack trace below” and the bottom of the trace was “cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info)”. I found that reducing the image size in the resize property (not the size) avoided this. Despite dropping down from 480 to 240 I still got a good result on a large swinv2.

nikem · June 22, 2022, 9:51am

Finally solved the problem. It turns out it is not about the walk-thru code/data or anything else. It is all about a PyTorch bug that appears on certain Nvidia drivers.

gsg · June 22, 2022, 8:44pm

Had the same problem, intermittently… Reducing from 480 to 360 also addressed it.

brismith · June 22, 2022, 8:52pm

Although I just found that increasing could fix it too. I think just different values hit the PyTorch bug - so you have to see what works.

Daniel · June 23, 2022, 12:29pm

Walkthru 10 detailed note in the form of questions

The best vision models for fine-tuning notebook

00:00 - Questions on tabular data and the fastbook has the answer

Why paddy dataset is interesting

07:06

paddy dataset is similar to ImageNet in terms of shape and size but have no paddy labels

What kind of dataset can do well on fine-tuning a pre-trained model?

08:37

Is the dataset (e.g., PETS dataset) very similar to the pre-trained model’s dataset (e.g., ImageNet)?
The more similar, the better the dataset can fine tune the model by making use much of the pretrained weights

How large is the dataset, especially when the dataset is not similar e.g., the planet dataset to the Imagenet?
When datasets are very different, most of weights from the pretrained model will be useless, so the larger of the dataset, the more weights can be trained, the better the model can learn

Experiment to find out the best model for fine-tuning using similar and large dataset vs dissimilar and small dataset

10:44

If we can find the best model from PETS dataset and Planet dataset, then it may be applied to other similar senarios

Jeremy walks us through how he and Thomas Capelle designed their experiments

11:55

Explore the fine_tune.py from fastai_timm repo

Explore the sweep_planets_lr.yaml from the repo

Weights and biases API can enable us to see our experiment results inside Jupyter notebook

What does Jeremy use gist for?

14:10

How Jeremy use WandB API to use their experiment results inside Jupyter notebook

15:00

How to turn a dataframe into a string

17:04

StringIO is the key to make sure pd.to_csv to save dataframe into a string rather than a file

How Jeremy create a gist?

17:50

import ghapi.core as gh
g = gh.GhAPI()
gist = g.create_gist('description of the gist', content_as_string, filename='', public=True)
gist.html_url

What does Jeremy use gist for here and generally?

How to do score models with data from the gist url

19:45

How to calculate the score for all models based on their error_rate, fit_time, and GPU_mem?

How does Jeremy come up with the score design?

How to sort all the models based on their score and display the top 15 models?

#question How much does fit_time and GPU_mem matter more and when?