Attempt to use the new fastai 1.0 for dogscats lesson

nok · October 14, 2018, 6:24pm

I tried to replicate the old lesson1 dogscats with the new fastai 1.0 as a warm up exercise.

I got weird NaN and cannot train my model properly, I did tried training the MNIST dataset and it’s perfectly fine, I am not sure what’s wrong here.

If i use the sample dogscats data, I even got a error like this complaining about the OneCycleScheduler
ZeroDivisionError: division by zero

Appreciate help in advance.

gist.github.com

https://gist.github.com/noklam/9afca6260b362f0bb6d8ce6427fc8903

lesson1_v3.ipynb

{
  "cells": [
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "# Put these at the top of every notebook, to get automatic reloading and inline plotting\n%reload_ext autoreload\n%autoreload 2\n%matplotlib inline\nfrom IPython.core.interactiveshell import InteractiveShell\n# pretty print all cell's output and not just the last one\n# InteractiveShell.ast_node_interactivity = \"all\"\n",
      "execution_count": 1,
      "outputs": []

This file has been truncated. show original

gerardo · October 15, 2018, 12:56am

Have you tried the one under examples/dogs_cats.ipynb

nok · October 15, 2018, 1:34am

Not yet, I missed that directory, will have a look tonight, thank you.

nok · October 15, 2018, 3:26pm

I see the example is updated, but I still get NaN even just running the example…

jamesrequa · October 16, 2018, 4:18am

@nok Are you running from the github repo or the pip/conda install? I would personally suggest using the fastai package from conda or pip as that will be the most stable version since the repo is under active development and could contain bugs. (install instructions here: https://github.com/fastai/fastai#conda-install)

See screenshot below where I ran the notebook with fastai v1 and it seems to be working great…note that I did use the notebook version prior to some recent changes made in the past 24 hours

nok · October 16, 2018, 5:22am

@jamesrequa I run from the github repo, the thing is the examples notebook was changed as well.
for example, the data is now
data = ImageDataBunch.from_folder(path, ds_tfms=get_transforms(), tfms=imagenet_norm, size=224)

So you are suggesting using the pip fastai package and remove the symlink?

jamesrequa · October 16, 2018, 4:32pm

Yeah if you are using the pip or conda install, you don’t need a symlink, the import statements will take care of it. Just create a new notebook separate from the repo so it won’t pull from that.

Yeah those were the changes I was referring to. My guess is everything will be working great by the time class starts But in the meantime, I just used image_data_from_folder since ImageDataBunch wasn’t included in the install package yet (as of fastai v1.0.5 when I last checked).

nok · October 16, 2018, 4:57pm

Thanks @jamesrequa

I try git checkout tag 1.0.5 and remove the symlink, I make sure the fastai.version is 1.0.5 as well, I still get NaN though. If you read the thread below, it seems that it was cause by one of the transformation, but I cannot go deeper to debug effective for now as I am not familiar with the source yet…

This is another thread that discussing this issue.

jamesrequa · October 16, 2018, 6:14pm

@nok ah ok, in that case sorry I wasn’t aware of that issue. Are you using GCP? Or what platform are you using? Can you provide all of your hardware & OS specs?

jeremy · October 16, 2018, 10:15pm

@nok try now - update from master first. Hopefully it’s fixed.

nok · October 17, 2018, 7:28am

@jamesrequa @jeremy It’s fixed now. Thx!

cedric · October 19, 2018, 1:13pm

Heads-up. The upcoming fastai version 1.0.7 (currently Work In Progress) will break this API:

All vision models are now in the models module, including torchvision models (where tested and supported). So use models instead of tvm now.

Example:

v1.0.6: ConvLearner(data, tvm.resnet34, metrics=accuracy)
v1.0.7: ConvLearner(data, models.resnet34, metrics=accuracy)

See the full changelog.

Update 1:

conda and pip package for fastai version 1.0.7 is released.

champs.jaideep · February 20, 2019, 2:44pm

hi jeremy ,i get the problem of NaN loss.
I use Siam Network (dense121) for hump back whale competition and on version 1.39 .
FYI i use GCP .
What could be reason for NaN loss ?

In one forum i read max light has that issue ,so i zeroed it now checking if i still get the issue