Fastai v2 chat

How to use torchvision models with fastai2? I need to use mobilenet_v2 which is available in fastai1.

I’m getting the following error when trying to export the model
Running on colab with versions:
fastai version 0.0.16 pytorch version 1.4.0

initialized the learn object as follows:

learn = cnn_learner(dls, 
                    partial(arch,pretrained=pretrained), 
                    metrics=metrics,
                    cbs=cbs)

and dls as follows

def splitter(df):
    train = df.index[df['is_valid']==False].tolist()
    valid = df.index[df['is_valid']==True].tolist()
    # print("train",train[:10],"valid",valid[:10])
    return train,valid


def get_x(r): return r['name']
def get_y(r):  
    rv = r['label'].split(" ")
    if "" in rv:
        while "" in rv:
            rv.remove("")
    return rv

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x = get_x, 
                   get_y = get_y,
                   item_tfms = RandomResizedCrop(256, min_scale=0.08),
                   batch_tfms=augs)
bs=64

dls = dblock.dataloaders(df,bs=bs)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-66-fa5b61306ef3> in <module>()
----> 1 learn.export()

2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
    497         #To avoid the warning that come from PyTorch about model not being checked
    498         warnings.simplefilter("ignore")
--> 499         torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
    500     self.create_opt()
    501     if state is not None: self.opt.load_state_dict(state)

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    326 
    327     with _open_file_like(f, 'wb') as opened_file:
--> 328         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
    329 
    330 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
    399     pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
    400     pickler.persistent_id = persistent_id
--> 401     pickler.dump(obj)
    402 
    403     serialized_storage_keys = sorted(serialized_storages.keys())

AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'

Appreciate any help, Thanks!

Hi. It seems that somewhere you have use a lambda/inner function. They can’t be saved into pickle.

Thanks @vferrer think you are right that this is the issue but not sure where it is stemming from. Is combine_scheds.<locals>._inner related to callbacks?

@zlapp what are the callbacks in cbs?

11 posts were split to a new topic: Image segmentation

cbs=[SaveModelCallback(add_save=Path(MODEL_OUTPUT_PATH)),WandbCallback(log_preds=False)]

Also tried learn.cbs=None and learn.metrics=None before exporting but was still getting the same err.

Nothing else is missing, I don’t think… Try doing a git pull on your fastbook repo to get the latest, check in your fastbook folder to make sure there is utils.py, and run the cell you pasted (which has from utils import *), and then gv should work.

1 Like

@zlapp see what happens without any callback

Thanks. I git pull fastbook, fastai2 and fastcore. Then, I !pip install utils, before running the cell. now it is working. Thank you.

However, I tried to run “from utils import *” in 01_intro.ipynb. It shown “No module of azure”, I tried to install azure, but it has compatibility issue. I guess, it must be the issue of window. I am running jupyter lab in locally in Win10. I am not running in Azure platform. I think it can be an issue. Since window is not the high priority of the development team. Now I am using it to look up information and run the training etc with GCP.

Thanks @boris just ran and it is occurring without any callback passed in during intialization of the learner. Double checked and do not have lambdas. Will try and create a reproducable colab notebook and send. From checking learner summary only defaults are present:

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback

Error:

AttributeError                            Traceback (most recent call last)
<ipython-input-55-fa5b61306ef3> in <module>()
----> 1 learn.export()

2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
    497         #To avoid the warning that come from PyTorch about model not being checked
    498         warnings.simplefilter("ignore")
--> 499         torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
    500     self.create_opt()
    501     if state is not None: self.opt.load_state_dict(state)

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    326 
    327     with _open_file_like(f, 'wb') as opened_file:
--> 328         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
    329 
    330 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
    399     pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
    400     pickler.persistent_id = persistent_id
--> 401     pickler.dump(obj)
    402 
    403     serialized_storage_keys = sorted(serialized_storages.keys())

AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'

I was able to reproduce the error in a standalone colab notebook (based on notebook 6 from course-v4). Export fails after an interrupted fine_tune. Pretty sure this is a bug.

Thanks for the help @muellerzr and @boris
The error is in the last cell (not the KeyboardInterrupt)

It’s exactly that interrupt that’s causing the issues @zlapp. Due to how fine_tune works, the actual fit function is really 2, and we’re interrupting it in the middle of it. Letting fine_tune run all the way I can successfully just export the model.

Interrupting the model I can recreate the error you caused. We can tell by looking in learn.cbs. Notice the difference? (First one is before fine_tune, second is with the interrupt):

(#3) [TrainEvalCallback,Recorder,ProgressCallback]

(#4) [TrainEvalCallback,Recorder,ProgressCallback,ParamScheduler]

During fit any callback function can be added which is then removed at the end of it. We’re not letting it run to remove the fourth callback here

2 Likes

Very interesting. It’s nice to get a better look at the inner workings of fine_tune from this error I was getting. Would you consider this a bug or just not supported? Wondering if ProgressCallback should be cleaned up during/prior to export to avoid the error.

Could you see if it happens when installing fastai2 and fastcore from git?
The context manager added_cbs should have removed it even with a KeyboardInterrupt.

1 Like

Just tried installing from git, the err seems to not be reproducing. Also checked cbs and saw ProgressCallback wasn’t present. :+1:

Is there any way of preventing train from computing the losses??

This could be useful with models that return lossses at training and not at inference.

Not sure what you mean here. At inference (e.g. predict) there will be no ‘loss’ since there is no target?

Yijin

I have a model that returns the losses computed so, I don’t want one_batch to execute
self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')

Because the model in training model return losses and in validation mode returns predictions.

I am trying to redefine the learner as follows:

class Mask_RCNN_Learner(Learner):
    def __init__(self, dls, model, loss_func=None, opt_func=Adam, lr=defaults.lr, splitter=trainable_params, cbs=None,
                 metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True,
                 moms=(0.95,0.85,0.95)):
        super().__init__(dls, model, loss_func, opt_func, lr, splitter, cbs,
                 metrics, path, model_dir, wd, wd_bn_bias, train_bn,
                 moms)
      
    def _split(self, b):
        i = getattr(self.dls, 'n_inp', 1 if len(b)==1 else len(b)-1)
        self.xb,self.yb = b[:i],b[i:]
    
    def _do_epoch_train(self):
        try:
            self.dl = self.dls.train;                                     
            
            # Modification
            self.n_iter = len(self.dl)
            for o in enumerate(self.dl):
                i, b = *o
                self.iter = i
                try:
                    self._split(b)                      
                    loss_dict = self.model(*self.xb,*self.yb)           
                    if len(self.yb) == 0: return
                    self.loss = sum(loss for loss in loss_dict.values())
                    if not self.training: return
                    self.loss.backward()  
                    self.opt.step()                                     
                    self.opt.zero_grad()
                except CancelBatchException as e:
                    raise e   
        except CancelTrainException as e:
            raise e

    def _do_epoch_validate(self, ds_idx=1, dl=None):
        if dl is None: dl = self.dls[ds_idx]
        try:
            self.dl = dl;                                                 
            with torch.no_grad():
                # Modification
                self.n_iter = len(self.dl)
                for o in enumerate(self.dl):
                    i, b = *o
                    self.iter = i
                    try:
                        self._split(b)                   
                        detection = self.model(*self.xb);               
                        self.loss =  self.loss_func(detection, *self.yb)
                        
                        # COMPUTING METRICS
                        
                        if not self.training: return
                    except CancelBatchException as e:
                        raise e
        except CancelValidException as e:
            raise e                                                  
    
    @log_args(but='cbs')
    def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):
        with self.added_cbs(cbs):
            if reset_opt or not self.opt: self.create_opt()
            if wd is None: wd = self.wd
            if wd is not None: self.opt.set_hypers(wd=wd)
            self.opt.set_hypers(lr=self.lr if lr is None else lr)

            try:
                self._do_begin_fit(n_epoch)
                for epoch in range(n_epoch):
                    try:
                        self.epoch=epoch         
                        self._do_epoch_train()
                        self._do_epoch_validate()
                    except CancelEpochException as e:
                        raise e                       

            except CancelFitException as e:
                raise e 

However, I am getting:

  File "<ipython-input-39-0a0657f193c1>", line 23
    self._split(b)
              ^
SyntaxError: can't use starred expression here