A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

There is quite literally nothing different with our environments… I am very perplexted. I hate saying this but could you leave it alone for an hour, factory reset, and then see if it’s still there??

I guess that is the next thing to try, thanks. Will post here in an hour with updates :slight_smile:
Thanks for the help @muellerzr . Will shut down the million colab notebooks :smiley:

1 Like

@muellerzr hi Zach, so I ran it again, ran it from a different account and had a friend run it too. But it all failed with the same error. Is it possible that you are have a local change that is over-riding or not using the fastcore? Or are you using a editable install?

This is the closest I can find ie downgrade fastai to 2.0.19 with torch 1.6. I’m not sure if it would help but hoping to narrow down the versions where the same code would work for you.

This version was released after 2.1.0 , and adds fastcore 1.3 compatibility, whilst maintaining PyTorch 1.6 compatibility. It has no new features or bug fixes.

2 Likes

@barnacl today I’m going to be adding more tests to fastai to see if we can train, I’ll see if anything breaks

2 Likes

@msivanes thanks for the suggestion. It works with
pytorch=1.6
fastai=2.0.19
fastcore=1.3.1
On colab i did

pip uninstall torch -y
pip uninstall torch -y
# CUDA 10.1
!pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install fastai==2.0.19
pip install fastcore==1.3.1 
6 Likes

You guys are gold,
I Literally had to unwrap my metric function to check shapes and contents at each step to see that everything was fine and It was indeed the comparison sign that was messing things up

This is now working with fastai 2.0.19 (but training slower)

I’ll try looking again at this today…this is seriously strange

1 Like

I saw it was training slower for me to with CUDA 10.2 but it was fast with CUDA 10.1. But i didn’t look into into it more so could be just by chance. Camvid was taking around 1min from Zach’s notebook. Not sure if this helps you :slight_smile:

@muellerzr
Everything seems to be working fine today with vanilla Colab Pro + fastai updated to 2.1.7

1 Like

By the way people, I plan to start watching and doing walk with fastai series/videos and notebooks this new year, I guess time will be mornings at 8 to 9 Central Time (maybe on nights, it depends if all who want to join is on morning or night). Tuesday we will watch video and Thursday we will run notebooks and chat, I think it is about 12 hours in videos or 12 weeks or from January until march (maybe timming can change). Starting Jan 5.

This will be on discord I guess.

2 Likes

The entirety of this course is now available on my website:

The notebooks are a bit more fleshed out now, and these will be copied over to the Practical Deep Learning for Coders repo as well later today

3 Likes

While testing out notebook 7 for Super-Resolution, I found our that unet_cofig is deprecated.
So I changed unet_learner to
unet_learner(dls_gen,arch=resnet34, loss_func=MSELossFlat, blur=True, norm_type=NormType.Weight, self_attention=True, y_range=(-3.,3.), n_out=3) replacing all the params for config and adding them in directly.

But I can’t get it to work. When I run learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)

I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-76-c661610c2951> in <module>()
----> 1 learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)
      2 # learn_gen.fit_one_cycle()

8 frames
/usr/local/lib/python3.6/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    110     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    111               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    113 
    114 # Cell

/usr/local/lib/python3.6/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    198 
    199     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
--> 200         with self.added_cbs(cbs):
    201             if reset_opt or not self.opt: self.create_opt()
    202             if wd is None: wd = self.wd

/usr/lib/python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

/usr/local/lib/python3.6/dist-packages/fastai/learner.py in added_cbs(self, cbs)
    119     @contextmanager
    120     def added_cbs(self, cbs):
--> 121         self.add_cbs(cbs)
    122         try: yield
    123         finally: self.remove_cbs(cbs)

/usr/local/lib/python3.6/dist-packages/fastai/learner.py in add_cbs(self, cbs)
    100 
    101     def _grab_cbs(self, cb_cls): return L(cb for cb in self.cbs if isinstance(cb, cb_cls))
--> 102     def add_cbs(self, cbs): L(cbs).map(self.add_cb)
    103     def remove_cbs(self, cbs): L(cbs).map(self.remove_cb)
    104     def add_cb(self, cb):

/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs)
    152     def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step))
    153 
--> 154     def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs))
    155     def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs))
    156     def filter(self, f=noop, negate=False, gen=False, **kwargs):

/usr/local/lib/python3.6/dist-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs)
    654     res = map(g, iterable)
    655     if gen: return res
--> 656     return list(res)
    657 
    658 # Cell

/usr/local/lib/python3.6/dist-packages/fastcore/basics.py in __call__(self, *args, **kwargs)
    644             if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
    645         fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 646         return self.func(*fargs, **kwargs)
    647 
    648 # Cell

/usr/local/lib/python3.6/dist-packages/fastai/learner.py in add_cb(self, cb)
    107         cb.learn = self
    108         setattr(self, cb.name, cb)
--> 109         self.cbs.append(cb)
    110         return self
    111 

AttributeError: 'NoneType' object has no attribute 'append' 

Do you have any solution for this? I don’t understand what I did wrong.

This is running on:

Name: fastai
Version: 2.1.10

Name: fastcore
Version: 1.3.16

I downgraded fastcore and Still get the same error.

Name: fastcore
Version: 1.3.13

EDIT:
So based on others in the thread, I have downgraded the rest to this:

!pip uninstall torch -y
!pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip install fastai==2.0.19
!pip install fastcore==1.3.1 

now I don’t get the same NoneType error, I instead get this:

epoch	train_loss	valid_loss	time
0	0.000000	00:00
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-c661610c2951> in <module>()
----> 1 learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)
      2 # learn_gen.fit_one_cycle()

20 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py in legacy_get_string(size_average, reduce, emit_warning)
     35         reduce = True
     36 
---> 37     if size_average and reduce:
     38         ret = 'mean'
     39     elif reduce:

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

For now use the notebooks in the walkwithfastai repo: https://github.com/walkwithfastai/walkwithfastai.github.io/tree/master/nbs/course2020

The original repo notebooks I’ll update soon. However keep an eye on the installed pip versions. Some parts are using the dev version of the library.

I’ll make sure to keep an eye on that. At the moment, the train time for each epoch in the Generator is taking me close to 90mins. I’ll mess around with the libraries and see if anything helps out. Thank you so much!

Did you enable GPU?

Yes, GPU is enabled.
This is the generator: epochs 3 to 5. The previous two also took 90 minutes for each run
This is the generator. Epochs 3-> 5

This is the Critic training right now, somewhat faster than I expected
This is the Critic training right now, somewhat faster than I expected

What does GenL.dls.device give you? That definitely looks like it’s running CPU bound

I’ll re-run it and check. But I’m 90% sure it was running on the GPU. I think some other APIs straight-up fail on the CPU, I’m not sure. I trained this yesterday and saved the model.
Running the reloaded model, I get this:
image

I think you’re missing loss_func=MSELossFlat() with parentheses.

2 Likes