How to use torchvision models with fastai2? I need to use mobilenet_v2 which is available in fastai1.
I’m getting the following error when trying to export the model
Running on colab with versions:
fastai version 0.0.16 pytorch version 1.4.0
initialized the learn object as follows:
learn = cnn_learner(dls,
partial(arch,pretrained=pretrained),
metrics=metrics,
cbs=cbs)
and dls as follows
def splitter(df):
train = df.index[df['is_valid']==False].tolist()
valid = df.index[df['is_valid']==True].tolist()
# print("train",train[:10],"valid",valid[:10])
return train,valid
def get_x(r): return r['name']
def get_y(r):
rv = r['label'].split(" ")
if "" in rv:
while "" in rv:
rv.remove("")
return rv
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x = get_x,
get_y = get_y,
item_tfms = RandomResizedCrop(256, min_scale=0.08),
batch_tfms=augs)
bs=64
dls = dblock.dataloaders(df,bs=bs)
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-66-fa5b61306ef3> in <module>()
----> 1 learn.export()
2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
497 #To avoid the warning that come from PyTorch about model not being checked
498 warnings.simplefilter("ignore")
--> 499 torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
500 self.create_opt()
501 if state is not None: self.opt.load_state_dict(state)
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
326
327 with _open_file_like(f, 'wb') as opened_file:
--> 328 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
329
330
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
399 pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
400 pickler.persistent_id = persistent_id
--> 401 pickler.dump(obj)
402
403 serialized_storage_keys = sorted(serialized_storages.keys())
AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'
Appreciate any help, Thanks!
Thanks @vferrer think you are right that this is the issue but not sure where it is stemming from. Is combine_scheds.<locals>._inner
related to callbacks?
cbs=[SaveModelCallback(add_save=Path(MODEL_OUTPUT_PATH)),WandbCallback(log_preds=False)]
Also tried learn.cbs=None
and learn.metrics=None
before exporting but was still getting the same err.
Nothing else is missing, I don’t think… Try doing a git pull
on your fastbook repo to get the latest, check in your fastbook folder to make sure there is utils.py, and run the cell you pasted (which has from utils import *
), and then gv
should work.
Thanks. I git pull fastbook, fastai2 and fastcore. Then, I !pip install utils, before running the cell. now it is working. Thank you.
However, I tried to run “from utils import *” in 01_intro.ipynb. It shown “No module of azure”, I tried to install azure, but it has compatibility issue. I guess, it must be the issue of window. I am running jupyter lab in locally in Win10. I am not running in Azure platform. I think it can be an issue. Since window is not the high priority of the development team. Now I am using it to look up information and run the training etc with GCP.
Thanks @boris just ran and it is occurring without any callback passed in during intialization of the learner. Double checked and do not have lambdas. Will try and create a reproducable colab notebook and send. From checking learner summary only defaults are present:
Callbacks:
- TrainEvalCallback
- Recorder
- ProgressCallback
Error:
AttributeError Traceback (most recent call last)
<ipython-input-55-fa5b61306ef3> in <module>()
----> 1 learn.export()
2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
497 #To avoid the warning that come from PyTorch about model not being checked
498 warnings.simplefilter("ignore")
--> 499 torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
500 self.create_opt()
501 if state is not None: self.opt.load_state_dict(state)
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
326
327 with _open_file_like(f, 'wb') as opened_file:
--> 328 _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
329
330
/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
399 pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
400 pickler.persistent_id = persistent_id
--> 401 pickler.dump(obj)
402
403 serialized_storage_keys = sorted(serialized_storages.keys())
AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'
I was able to reproduce the error in a standalone colab notebook (based on notebook 6 from course-v4). Export fails after an interrupted fine_tune. Pretty sure this is a bug.
Thanks for the help @muellerzr and @boris
The error is in the last cell (not the KeyboardInterrupt)
It’s exactly that interrupt that’s causing the issues @zlapp. Due to how fine_tune
works, the actual fit function is really 2, and we’re interrupting it in the middle of it. Letting fine_tune
run all the way I can successfully just export the model.
Interrupting the model I can recreate the error you caused. We can tell by looking in learn.cbs
. Notice the difference? (First one is before fine_tune
, second is with the interrupt):
(#3) [TrainEvalCallback,Recorder,ProgressCallback]
(#4) [TrainEvalCallback,Recorder,ProgressCallback,ParamScheduler]
During fit
any callback function can be added which is then removed at the end of it. We’re not letting it run to remove the fourth callback here
Very interesting. It’s nice to get a better look at the inner workings of fine_tune
from this error I was getting. Would you consider this a bug or just not supported? Wondering if ProgressCallback should be cleaned up during/prior to export to avoid the error.
Could you see if it happens when installing fastai2 and fastcore from git?
The context manager added_cbs
should have removed it even with a KeyboardInterrupt
.
Just tried installing from git, the err seems to not be reproducing. Also checked cbs and saw ProgressCallback
wasn’t present.
Is there any way of preventing train from computing the losses??
This could be useful with models that return lossses at training and not at inference.
Not sure what you mean here. At inference (e.g. predict
) there will be no ‘loss’ since there is no target?
Yijin
I have a model that returns the losses computed so, I don’t want one_batch to execute
self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
Because the model in training model return losses and in validation mode returns predictions.
I am trying to redefine the learner as follows:
class Mask_RCNN_Learner(Learner):
def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
moms=(0.95,0.85,0.95)):
super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
metrics, path, model_dir, wd, wd_bn_bias, train_bn,
moms)
def _split(self, b):
i = getattr(self.dls, 'n_inp', 1 if len(b)==1 else len(b)-1)
self.xb,self.yb = b[:i],b[i:]
def _do_epoch_train(self):
try:
self.dl = self.dls.train;
# Modification
self.n_iter = len(self.dl)
for o in enumerate(self.dl):
i, b = *o
self.iter = i
try:
self._split(b)
loss_dict = self.model(*self.xb,*self.yb)
if len(self.yb) == 0: return
self.loss = sum(loss for loss in loss_dict.values())
if not self.training: return
self.loss.backward()
self.opt.step()
self.opt.zero_grad()
except CancelBatchException as e:
raise e
except CancelTrainException as e:
raise e
def _do_epoch_validate(self, ds_idx=1, dl=None):
if dl is None: dl = self.dls[ds_idx]
try:
self.dl = dl;
with torch.no_grad():
# Modification
self.n_iter = len(self.dl)
for o in enumerate(self.dl):
i, b = *o
self.iter = i
try:
self._split(b)
detection = self.model(*self.xb);
self.loss = self.loss_func(detection, *self.yb)
# COMPUTING METRICS
if not self.training: return
except CancelBatchException as e:
raise e
except CancelValidException as e:
raise e
@log_args(but='cbs')
def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
with self.added_cbs(cbs):
if reset_opt or not self.opt: self.create_opt()
if wd is None: wd = self.wd
if wd is not None: self.opt.set_hypers(wd=wd)
self.opt.set_hypers(lr=self.lr if lr is None else lr)
try:
self._do_begin_fit(n_epoch)
for epoch in range(n_epoch):
try:
self.epoch=epoch
self._do_epoch_train()
self._do_epoch_validate()
except CancelEpochException as e:
raise e
except CancelFitException as e:
raise e
However, I am getting:
File "<ipython-input-39-0a0657f193c1>", line 23
self._split(b)
^
SyntaxError: can't use starred expression here