Attempt to use the new fastai 1.0 for dogscats lesson


(nok) #1

I tried to replicate the old lesson1 dogscats with the new fastai 1.0 as a warm up exercise.

I got weird NaN and cannot train my model properly, I did tried training the MNIST dataset and it’s perfectly fine, I am not sure what’s wrong here.

If i use the sample dogscats data, I even got a error like this complaining about the OneCycleScheduler
ZeroDivisionError: division by zero

Appreciate help in advance.


(Gerardo Garcia) #3

Have you tried the one under examples/dogs_cats.ipynb


(nok) #4

Not yet, I missed that directory, will have a look tonight, thank you.


(nok) #5

I see the example is updated, but I still get NaN even just running the example…


Fastai examples dogscats NaN Loss
(James Requa) #6

@nok Are you running from the github repo or the pip/conda install? I would personally suggest using the fastai package from conda or pip as that will be the most stable version since the repo is under active development and could contain bugs. (install instructions here: https://github.com/fastai/fastai#conda-install)

See screenshot below where I ran the notebook with fastai v1 and it seems to be working great…note that I did use the notebook version prior to some recent changes made in the past 24 hours :slight_smile:


(nok) #7

@jamesrequa I run from the github repo, the thing is the examples notebook was changed as well.
for example, the data is now
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), tfms=imagenet_norm, size=224)

So you are suggesting using the pip fastai package and remove the symlink?


(James Requa) #8

Yeah if you are using the pip or conda install, you don’t need a symlink, the import statements will take care of it. Just create a new notebook separate from the repo so it won’t pull from that.

Yeah those were the changes I was referring to. My guess is everything will be working great by the time class starts :slight_smile: But in the meantime, I just used image_data_from_folder since ImageDataBunch wasn’t included in the install package yet (as of fastai v1.0.5 when I last checked).


(nok) #9

Thanks @jamesrequa

I try git checkout tag 1.0.5 and remove the symlink, I make sure the fastai.version is 1.0.5 as well, I still get NaN though. If you read the thread below, it seems that it was cause by one of the transformation, but I cannot go deeper to debug effective for now as I am not familiar with the source yet…


This is another thread that discussing this issue.


(James Requa) #10

@nok ah ok, in that case sorry I wasn’t aware of that issue. Are you using GCP? Or what platform are you using? Can you provide all of your hardware & OS specs?


(Jeremy Howard (Admin)) #11

@nok try now - update from master first. Hopefully it’s fixed.


(nok) #12

@jamesrequa @jeremy It’s fixed now. Thx!


(Cedric Chee) #13

Heads-up. The upcoming fastai version 1.0.7 (currently Work In Progress) will break this API:

All vision models are now in the models module, including torchvision models (where tested and supported). So use models instead of tvm now.

Example:

  • v1.0.6: ConvLearner(data, tvm.resnet34, metrics=accuracy)
  • v1.0.7: ConvLearner(data, models.resnet34, metrics=accuracy)

See the full changelog.


Update 1:

  • conda and pip package for fastai version 1.0.7 is released.

(jaideep v) #14

hi jeremy ,i get the problem of NaN loss.
I use Siam Network (dense121) for hump back whale competition and on version 1.39 .
FYI i use GCP .
What could be reason for NaN loss ?

In one forum i read max light has that issue ,so i zeroed it now checking if i still get the issue