If I run the notebook from github, cats v dogs trains fine.
If I take the same code, put it into a python script and run it, my training results in NaNs. I’ve verified at inference that this is not a reporting anomaly, but truly a failure to train (as the notebook model.ckpt is highly accurate, and my command line trained version is highly inaccurate)
Note, I’m not installing fastai, fastcore, fastprogress, or fastdownload, so that I can more easily search and modify it in the same location. Furthermore, making these same modifications to the notebook and it trains fine. I’m using the same conda environment in both places (fastai2022)
I’ve tried training in powershell and the normal command prompt, both result in NaN training loss and validation loss. Also the progress bar does not update until nearly the end of the epoch, then it rapidly fills in.
Maybe you figured this out, maybe not but posting this in case others run into the same issue.
Working on windows 11 with cuda and the fastai library on the chapter 6 multicat example.
I’m using conda and visual code to run the examples.
I too kept getting nan for train_loss and valid_loss.
On a whim I add the num_workers=0 parm to my dataloaders call like
dls = dblock.dataloaders(df,num_workers=0)
I think this keeps the data pipeline single threaded. I have had to use this in almost all of the examples because of some type of bug with using cuda on windows.
Windows isn’t worth pursuing if you can avoid it, IMHO.
(nearly everyone/everything uses Linux in one flavour or another for DL).
Luckily it is much easier to use Linux on Windows machines these days via WSL2 (Windows Subsystem for Linux).
Linux skills are good to acquire for working across the many cloud providers and/or for ML industry work prospects etc … in the live coding sessions Jeremy walks through many tips on using linux / installing wsl etc
It is a steeper learning curve to learn a different OS and DL at the same time though.
Or as the course suggests use Colab and Kaggle whilst learning, to reduce wasting cycles on system setup & config issues.