@zlapp what are the callbacks in cbs?
cbs=[SaveModelCallback(add_save=Path(MODEL_OUTPUT_PATH)),WandbCallback(log_preds=False)]
Also tried learn.cbs=None
and learn.metrics=None
before exporting but was still getting the same err.
Nothing else is missing, I don’t think… Try doing a git pull
on your fastbook repo to get the latest, check in your fastbook folder to make sure there is utils.py, and run the cell you pasted (which has from utils import *
), and then gv
should work.
Thanks. I git pull fastbook, fastai2 and fastcore. Then, I !pip install utils, before running the cell. now it is working. Thank you.
However, I tried to run “from utils import *” in 01_intro.ipynb. It shown “No module of azure”, I tried to install azure, but it has compatibility issue. I guess, it must be the issue of window. I am running jupyter lab in locally in Win10. I am not running in Azure platform. I think it can be an issue. Since window is not the high priority of the development team. Now I am using it to look up information and run the training etc with GCP.
Thanks @boris just ran and it is occurring without any callback passed in during intialization of the learner. Double checked and do not have lambdas. Will try and create a reproducable colab notebook and send. From checking learner summary only defaults are present:
Callbacks:
- TrainEvalCallback
- Recorder
- ProgressCallback
Error:
AttributeError Traceback (most recent call last)
<ipython-input-55-fa5b61306ef3> in <module>()
----> 1 learn.export()
2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
497 #To avoid the warning that come from PyTorch about model not being checked
498 warnings.simplefilter("ignore")
--> 499 torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
500 self.create_opt()
501 if state is not None: self.opt.load_state_dict(state)
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
326
327 with _open_file_like(f, 'wb') as opened_file:
--> 328 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
329
330
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
399 pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
400 pickler.persistent_id = persistent_id
--> 401 pickler.dump(obj)
402
403 serialized_storage_keys = sorted(serialized_storages.keys())
AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'
I was able to reproduce the error in a standalone colab notebook (based on notebook 6 from course-v4). Export fails after an interrupted fine_tune. Pretty sure this is a bug.
Thanks for the help @muellerzr and @boris
The error is in the last cell (not the KeyboardInterrupt)
It’s exactly that interrupt that’s causing the issues @zlapp. Due to how fine_tune
works, the actual fit function is really 2, and we’re interrupting it in the middle of it. Letting fine_tune
run all the way I can successfully just export the model.
Interrupting the model I can recreate the error you caused. We can tell by looking in learn.cbs
. Notice the difference? (First one is before fine_tune
, second is with the interrupt):
(#3) [TrainEvalCallback,Recorder,ProgressCallback]
(#4) [TrainEvalCallback,Recorder,ProgressCallback,ParamScheduler]
During fit
any callback function can be added which is then removed at the end of it. We’re not letting it run to remove the fourth callback here
Very interesting. It’s nice to get a better look at the inner workings of fine_tune
from this error I was getting. Would you consider this a bug or just not supported? Wondering if ProgressCallback should be cleaned up during/prior to export to avoid the error.
Could you see if it happens when installing fastai2 and fastcore from git?
The context manager added_cbs
should have removed it even with a KeyboardInterrupt
.
Just tried installing from git, the err seems to not be reproducing. Also checked cbs and saw ProgressCallback
wasn’t present.
Is there any way of preventing train from computing the losses??
This could be useful with models that return lossses at training and not at inference.
Not sure what you mean here. At inference (e.g. predict
) there will be no ‘loss’ since there is no target?
Yijin
I have a model that returns the losses computed so, I don’t want one_batch to execute
self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
Because the model in training model return losses and in validation mode returns predictions.
I am trying to redefine the learner as follows:
class Mask_RCNN_Learner(Learner):
def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
moms=(0.95,0.85,0.95)):
super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
metrics, path, model_dir, wd, wd_bn_bias, train_bn,
moms)
def _split(self, b):
i = getattr(self.dls, 'n_inp', 1 if len(b)==1 else len(b)-1)
self.xb,self.yb = b[:i],b[i:]
def _do_epoch_train(self):
try:
self.dl = self.dls.train;
# Modification
self.n_iter = len(self.dl)
for o in enumerate(self.dl):
i, b = *o
self.iter = i
try:
self._split(b)
loss_dict = self.model(*self.xb,*self.yb)
if len(self.yb) == 0: return
self.loss = sum(loss for loss in loss_dict.values())
if not self.training: return
self.loss.backward()
self.opt.step()
self.opt.zero_grad()
except CancelBatchException as e:
raise e
except CancelTrainException as e:
raise e
def _do_epoch_validate(self, ds_idx=1, dl=None):
if dl is None: dl = self.dls[ds_idx]
try:
self.dl = dl;
with torch.no_grad():
# Modification
self.n_iter = len(self.dl)
for o in enumerate(self.dl):
i, b = *o
self.iter = i
try:
self._split(b)
detection = self.model(*self.xb);
self.loss = self.loss_func(detection, *self.yb)
# COMPUTING METRICS
if not self.training: return
except CancelBatchException as e:
raise e
except CancelValidException as e:
raise e
@log_args(but='cbs')
def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
with self.added_cbs(cbs):
if reset_opt or not self.opt: self.create_opt()
if wd is None: wd = self.wd
if wd is not None: self.opt.set_hypers(wd=wd)
self.opt.set_hypers(lr=self.lr if lr is None else lr)
try:
self._do_begin_fit(n_epoch)
for epoch in range(n_epoch):
try:
self.epoch=epoch
self._do_epoch_train()
self._do_epoch_validate()
except CancelEpochException as e:
raise e
except CancelFitException as e:
raise e
However, I am getting:
File "<ipython-input-39-0a0657f193c1>", line 23
self._split(b)
^
SyntaxError: can't use starred expression here
The error message points to your use of *
, in the line i, b = *o
. This SO page explains it, I think? You should try changing that line to i, b = o
, or delete that line and just change the line above to for i, b in enumerate(self.dl):
Not sure how all these relate to your question about not computing losses – I did not read through your code, and don’t know what’s happening in it…!
Good luck.
Yijin
I have solved it, however don’t know pretty well how.
class Mask_RCNN_Learner(Learner):
def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
moms=(0.95,0.85,0.95)):
super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
metrics, path, model_dir, wd, wd_bn_bias, train_bn,
moms)
def all_batches(self):
self.n_iter = len(self.dl)
for o in enumerate(self.dl): self.one_batch(*o)
def one_batch(self, i, b):
self.iter = i
try:
self._split(b); self('begin_batch')
loss_dict = self.model(*self.xb,self.yb); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
self.loss = loss; self('after_loss')
if not self.training: return
self.loss.backward(); self('after_backward')
self.opt.step(); self('after_step')
self.opt.zero_grad()
except CancelBatchException: self('after_cancel_batch')
finally: self('after_batch')
def _do_begin_fit(self, n_epoch):
self.n_epoch,self.loss = n_epoch,tensor(0.); self('begin_fit')
def _do_epoch_train(self):
try:
self.dl = self.dls.train; self('begin_train')
self.all_batches()
except CancelTrainException: self('after_cancel_train')
finally: self('after_train')
def _do_epoch_validate(self, ds_idx=1, dl=None):
if dl is None: dl = self.dls[ds_idx]
try:
self.dl = dl; self('begin_validate')
with torch.no_grad(): self.all_batches()
except CancelValidException: self('after_cancel_validate')
finally: self('after_validate')
@log_args(but='cbs')
def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
with self.added_cbs(cbs):
if reset_opt or not self.opt: self.create_opt()
if wd is None: wd = self.wd
if wd is not None: self.opt.set_hypers(wd=wd)
self.opt.set_hypers(lr=self.lr if lr is None else lr)
try:
self._do_begin_fit(n_epoch)
for epoch in range(n_epoch):
try:
self.epoch=epoch; self('begin_epoch')
self._do_epoch_train()
self._do_epoch_validate()
except CancelEpochException: self('after_cancel_epoch')
finally: self('after_epoch')
except CancelFitException: self('after_cancel_fit')
finally: self('after_fit')
If you look, I am just adjusting these lines of code:
loss_dict = self.model(*self.xb,self.yb); self('after_pred')
if len(self.yb) == 0: return
loss = sum(loss for loss in loss_dict.values())
self.loss = loss; self('after_loss')
This is done because i am working with torchvision.models.detection.maskrcnn-resnet50_fpn
. This model expects as input and image an a dict with the target.
The thing is that in evaluation it return a dict with a mask, boxes and labels.
I would like the accuracy metrics to be calculated just in the mask.
That’s why I was asking where to modify the data passed into metrics. The output of this model is not a usual one
Does image should look visually similar before and after normalization?
I’ve created dataloaders with no augmentations, so the images came out of dls.one_batch()
have been just transformed to float tensor.
xb,_ = dls.one_batch()
norm = Normalize.from_stats(*imagenet_stats)
xb_n = norm(xb)
show_image(xb_n[0])
Then I applied the Normalize
transform and viewed the image, it looks totally distorted — some portion of image have been masked and while the visible portion seems like gone through major brightness/contrast change. I’ve also calculated my own statistics of dataset and tried to Normalize using that, but images look equally distorted.
I have seen in the code many annotations like @patch
and @typedispatch
.
What are they doing this annotations??