There is quite literally nothing different with our environments… I am very perplexted. I hate saying this but could you leave it alone for an hour, factory reset, and then see if it’s still there??
I guess that is the next thing to try, thanks. Will post here in an hour with updates
Thanks for the help @muellerzr . Will shut down the million colab notebooks
@muellerzr hi Zach, so I ran it again, ran it from a different account and had a friend run it too. But it all failed with the same error. Is it possible that you are have a local change that is over-riding or not using the fastcore? Or are you using a editable install?
This is the closest I can find ie downgrade fastai to 2.0.19 with torch 1.6. I’m not sure if it would help but hoping to narrow down the versions where the same code would work for you.
This version was released after
2.1.0
, and adds fastcore 1.3 compatibility, whilst maintaining PyTorch 1.6 compatibility. It has no new features or bug fixes.
@barnacl today I’m going to be adding more tests to fastai to see if we can train, I’ll see if anything breaks
@msivanes thanks for the suggestion. It works with
pytorch=1.6
fastai=2.0.19
fastcore=1.3.1
On colab i did
pip uninstall torch -y
pip uninstall torch -y
# CUDA 10.1
!pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
pip install fastai==2.0.19
pip install fastcore==1.3.1
You guys are gold,
I Literally had to unwrap my metric function to check shapes and contents at each step to see that everything was fine and It was indeed the comparison sign that was messing things up
This is now working with fastai 2.0.19 (but training slower)
I’ll try looking again at this today…this is seriously strange
I saw it was training slower for me to with CUDA 10.2 but it was fast with CUDA 10.1. But i didn’t look into into it more so could be just by chance. Camvid was taking around 1min from Zach’s notebook. Not sure if this helps you
@muellerzr
Everything seems to be working fine today with vanilla Colab Pro + fastai updated to 2.1.7
By the way people, I plan to start watching and doing walk with fastai series/videos and notebooks this new year, I guess time will be mornings at 8 to 9 Central Time (maybe on nights, it depends if all who want to join is on morning or night). Tuesday we will watch video and Thursday we will run notebooks and chat, I think it is about 12 hours in videos or 12 weeks or from January until march (maybe timming can change). Starting Jan 5.
This will be on discord I guess.
The entirety of this course is now available on my website:
The notebooks are a bit more fleshed out now, and these will be copied over to the Practical Deep Learning for Coders repo as well later today
While testing out notebook 7 for Super-Resolution, I found our that unet_cofig
is deprecated.
So I changed unet_learner
to
unet_learner(dls_gen,arch=resnet34, loss_func=MSELossFlat, blur=True, norm_type=NormType.Weight, self_attention=True, y_range=(-3.,3.), n_out=3)
replacing all the params for config and adding them in directly.
But I can’t get it to work. When I run learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)
I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-76-c661610c2951> in <module>()
----> 1 learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)
2 # learn_gen.fit_one_cycle()
8 frames
/usr/local/lib/python3.6/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell
/usr/local/lib/python3.6/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
198
199 def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
--> 200 with self.added_cbs(cbs):
201 if reset_opt or not self.opt: self.create_opt()
202 if wd is None: wd = self.wd
/usr/lib/python3.6/contextlib.py in __enter__(self)
79 def __enter__(self):
80 try:
---> 81 return next(self.gen)
82 except StopIteration:
83 raise RuntimeError("generator didn't yield") from None
/usr/local/lib/python3.6/dist-packages/fastai/learner.py in added_cbs(self, cbs)
119 @contextmanager
120 def added_cbs(self, cbs):
--> 121 self.add_cbs(cbs)
122 try: yield
123 finally: self.remove_cbs(cbs)
/usr/local/lib/python3.6/dist-packages/fastai/learner.py in add_cbs(self, cbs)
100
101 def _grab_cbs(self, cb_cls): return L(cb for cb in self.cbs if isinstance(cb, cb_cls))
--> 102 def add_cbs(self, cbs): L(cbs).map(self.add_cb)
103 def remove_cbs(self, cbs): L(cbs).map(self.remove_cb)
104 def add_cb(self, cb):
/usr/local/lib/python3.6/dist-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs)
152 def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step))
153
--> 154 def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs))
155 def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs))
156 def filter(self, f=noop, negate=False, gen=False, **kwargs):
/usr/local/lib/python3.6/dist-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs)
654 res = map(g, iterable)
655 if gen: return res
--> 656 return list(res)
657
658 # Cell
/usr/local/lib/python3.6/dist-packages/fastcore/basics.py in __call__(self, *args, **kwargs)
644 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i)
645 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:]
--> 646 return self.func(*fargs, **kwargs)
647
648 # Cell
/usr/local/lib/python3.6/dist-packages/fastai/learner.py in add_cb(self, cb)
107 cb.learn = self
108 setattr(self, cb.name, cb)
--> 109 self.cbs.append(cb)
110 return self
111
AttributeError: 'NoneType' object has no attribute 'append'
Do you have any solution for this? I don’t understand what I did wrong.
This is running on:
Name: fastai
Version: 2.1.10
Name: fastcore
Version: 1.3.16
I downgraded fastcore
and Still get the same error.
Name: fastcore
Version: 1.3.13
EDIT:
So based on others in the thread, I have downgraded the rest to this:
!pip uninstall torch -y
!pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip install fastai==2.0.19
!pip install fastcore==1.3.1
now I don’t get the same NoneType
error, I instead get this:
epoch train_loss valid_loss time
0 0.000000 00:00
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-c661610c2951> in <module>()
----> 1 learn_gen.fit_one_cycle(2, pct_start=0.8, wd=WeightDecay)
2 # learn_gen.fit_one_cycle()
20 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py in legacy_get_string(size_average, reduce, emit_warning)
35 reduce = True
36
---> 37 if size_average and reduce:
38 ret = 'mean'
39 elif reduce:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
For now use the notebooks in the walkwithfastai repo: https://github.com/walkwithfastai/walkwithfastai.github.io/tree/master/nbs/course2020
The original repo notebooks I’ll update soon. However keep an eye on the installed pip versions. Some parts are using the dev version of the library.
I’ll make sure to keep an eye on that. At the moment, the train time for each epoch in the Generator is taking me close to 90mins. I’ll mess around with the libraries and see if anything helps out. Thank you so much!
Did you enable GPU?
Yes, GPU is enabled.
This is the generator: epochs 3 to 5. The previous two also took 90 minutes for each run
This is the Critic training right now, somewhat faster than I expected
What does GenL.dls.device
give you? That definitely looks like it’s running CPU bound
I’ll re-run it and check. But I’m 90% sure it was running on the GPU. I think some other APIs straight-up fail on the CPU, I’m not sure. I trained this yesterday and saved the model.
Running the reloaded model, I get this:
I think you’re missing loss_func=MSELossFlat()
with parentheses.