Fastai v2 chat

Empty images
I’m setting up to run a vision model on some images that have been annotated with coco-json style annotations. Many of the training images have no objects of interest in them, therefore no bounding boxes.

I’m not sure how to properly annotate ‘empty’ images. I’ve tried adding a list of tuples to the annotation list for each of the empty photos, in two styles:

  1. ([],[]) #bbox, class
  2. ([[0.,0.,0.,0.]],[]) #bbox, class

It almost works: I’ve gotten the data and annotations into a fastai2 dataloader, and all seems fine until I try show_batch; then it throws an error. I’ve found some conflicting advice in the forums. Does anyone know for sure how to do it right?

Or is it preferable to train the model exclusively on images that do contain objects of interest?

Hey all. Here are the times im running into while building a language model. I want to make sure it seems right. Training is still going and all of the metrics are spot on, just want to make sure im doing it right.
Dataframe records - 17.1 million
Average length of text = 13 words
Time to build Dataloader = 2 hours 15 min
batch size = 256
seq_len = 128
Time per Epoch = 2 hours 35 min
GPU utilized = 20%

Again, everything seems to be doing very well, just checking if this is normal behavior or not :slight_smile:

When I tried to load the saved encoder from the LM to classification model, I get

RuntimeError: Error(s) in loading state_dict for AWD_LSTM:
	size mismatch for encoder.weight: copying a param with shape torch.Size([14280, 400]) from checkpoint, the shape in current model is torch.Size([14304, 400]).
	size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([14280, 400]) from checkpoint, the shape in current model is torch.Size([14304, 400]).

I’m following the fastbook draft, I did exactly like it said.

How to use torchvision models with fastai2? I need to use mobilenet_v2 which is available in fastai1.

I’m getting the following error when trying to export the model
Running on colab with versions:
fastai version 0.0.16 pytorch version 1.4.0

initialized the learn object as follows:

learn = cnn_learner(dls, 
                    partial(arch,pretrained=pretrained), 
                    metrics=metrics,
                    cbs=cbs)

and dls as follows

def splitter(df):
    train = df.index[df['is_valid']==False].tolist()
    valid = df.index[df['is_valid']==True].tolist()
    # print("train",train[:10],"valid",valid[:10])
    return train,valid


def get_x(r): return r['name']
def get_y(r):  
    rv = r['label'].split(" ")
    if "" in rv:
        while "" in rv:
            rv.remove("")
    return rv

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x = get_x, 
                   get_y = get_y,
                   item_tfms = RandomResizedCrop(256, min_scale=0.08),
                   batch_tfms=augs)
bs=64

dls = dblock.dataloaders(df,bs=bs)

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-66-fa5b61306ef3> in <module>()
----> 1 learn.export()

2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
    497         #To avoid the warning that come from PyTorch about model not being checked
    498         warnings.simplefilter("ignore")
--> 499         torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
    500     self.create_opt()
    501     if state is not None: self.opt.load_state_dict(state)

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    326 
    327     with _open_file_like(f, 'wb') as opened_file:
--> 328         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
    329 
    330 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
    399     pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
    400     pickler.persistent_id = persistent_id
--> 401     pickler.dump(obj)
    402 
    403     serialized_storage_keys = sorted(serialized_storages.keys())

AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'

Appreciate any help, Thanks!

Hi. It seems that somewhere you have use a lambda/inner function. They can’t be saved into pickle.

Thanks @vferrer think you are right that this is the issue but not sure where it is stemming from. Is combine_scheds.<locals>._inner related to callbacks?

@zlapp what are the callbacks in cbs?

11 posts were split to a new topic: Image segmentation

cbs=[SaveModelCallback(add_save=Path(MODEL_OUTPUT_PATH)),WandbCallback(log_preds=False)]

Also tried learn.cbs=None and learn.metrics=None before exporting but was still getting the same err.

Nothing else is missing, I don’t think… Try doing a git pull on your fastbook repo to get the latest, check in your fastbook folder to make sure there is utils.py, and run the cell you pasted (which has from utils import *), and then gv should work.

1 Like

@zlapp see what happens without any callback

Thanks. I git pull fastbook, fastai2 and fastcore. Then, I !pip install utils, before running the cell. now it is working. Thank you.

However, I tried to run “from utils import *” in 01_intro.ipynb. It shown “No module of azure”, I tried to install azure, but it has compatibility issue. I guess, it must be the issue of window. I am running jupyter lab in locally in Win10. I am not running in Azure platform. I think it can be an issue. Since window is not the high priority of the development team. Now I am using it to look up information and run the training etc with GCP.

Thanks @boris just ran and it is occurring without any callback passed in during intialization of the learner. Double checked and do not have lambdas. Will try and create a reproducable colab notebook and send. From checking learner summary only defaults are present:

Callbacks:
  - TrainEvalCallback
  - Recorder
  - ProgressCallback

Error:

AttributeError                            Traceback (most recent call last)
<ipython-input-55-fa5b61306ef3> in <module>()
----> 1 learn.export()

2 frames
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in export(self, fname, pickle_protocol)
    497         #To avoid the warning that come from PyTorch about model not being checked
    498         warnings.simplefilter("ignore")
--> 499         torch.save(self, self.path/fname, pickle_protocol=pickle_protocol)
    500     self.create_opt()
    501     if state is not None: self.opt.load_state_dict(state)

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    326 
    327     with _open_file_like(f, 'wb') as opened_file:
--> 328         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)
    329 
    330 

/usr/local/lib/python3.6/dist-packages/torch/serialization.py in _legacy_save(obj, f, pickle_module, pickle_protocol)
    399     pickler = pickle_module.Pickler(f, protocol=pickle_protocol)
    400     pickler.persistent_id = persistent_id
--> 401     pickler.dump(obj)
    402 
    403     serialized_storage_keys = sorted(serialized_storages.keys())

AttributeError: Can't pickle local object 'combine_scheds.<locals>._inner'

I was able to reproduce the error in a standalone colab notebook (based on notebook 6 from course-v4). Export fails after an interrupted fine_tune. Pretty sure this is a bug.

Thanks for the help @muellerzr and @boris
The error is in the last cell (not the KeyboardInterrupt)

It’s exactly that interrupt that’s causing the issues @zlapp. Due to how fine_tune works, the actual fit function is really 2, and we’re interrupting it in the middle of it. Letting fine_tune run all the way I can successfully just export the model.

Interrupting the model I can recreate the error you caused. We can tell by looking in learn.cbs. Notice the difference? (First one is before fine_tune, second is with the interrupt):

(#3) [TrainEvalCallback,Recorder,ProgressCallback]

(#4) [TrainEvalCallback,Recorder,ProgressCallback,ParamScheduler]

During fit any callback function can be added which is then removed at the end of it. We’re not letting it run to remove the fourth callback here

2 Likes

Very interesting. It’s nice to get a better look at the inner workings of fine_tune from this error I was getting. Would you consider this a bug or just not supported? Wondering if ProgressCallback should be cleaned up during/prior to export to avoid the error.

Could you see if it happens when installing fastai2 and fastcore from git?
The context manager added_cbs should have removed it even with a KeyboardInterrupt.

1 Like

Just tried installing from git, the err seems to not be reproducing. Also checked cbs and saw ProgressCallback wasn’t present. :+1: