Lesson 3 In-Class Discussion ✅

Anyone applied this on Nuclei dataset(https://www.kaggle.com/c/data-science-bowl-2018) using Mask-RCNN ?

I’m trying to train hand Segmentation for a new dataset (Egohands). However, I always get CUDA out of memory even with P100 (16Gb GPU). The error appear with even very small image size and batch size

size = src_size//32

The original image shape is (1280,720)

Can someone suggest me how to deal with it ? Thank you in advance

This might be helpful

1 Like

Hey! The idea behind load is not to have write all that chunk of code and re-create the databunch object every time you want to use it. It is both a computation and code thing. You don’t want to do the same thing twice, if you can avoid it.

To your question, it is not changing data_lm in the same way that loading a model from saved weights does not change the Learner object. However, say you ran it yesterday, created the databunch object and want to run it again today. In that case, there will be nothing to overwrite, in other words no data_lm to change. You will use .load and that will load the same databunch you created yesterday with less computation time and less lines of code.

Please let me know if this solves your problem.

Hey guys! Can anyone advice if I can use lesson3-head-pose.ipynb notebook flow to find several faces on a picture? So each picture will be labeled with multidimensional tensor. Will it be possible if there will be a different amount of faces on each picture?

Solution in this post.

1 Like

@sgugger, I’m wrong or in the jupyter notebook lesson3-camvid.ipynb, data.show_batch() does not show the real input to our model U-Net (real input = image transformed by tfms)?

(same question about learn.show_results() that does not show what will be predicted by the model (ie, a mask) but an overlay of the input image and its mask prediction)

Indeed, data.show_batch() shows this:

But the true input is just a normal image transformed by tfms, no ? (and its targeted label is its mask, see following screenshot)

Last question: what is the code used by fastai to overlay 2 images (an image and its mask as displayed by data.show_batch() and learn.show_results()?


1 Like

Hey, put this before your code:
get_ipython().config.get('IPKernelApp', {})['parent_appname'] = ""

such that it reads:
get_ipython().config.get('IPKernelApp', {})['parent_appname'] = ""
df = pd.read_csv(path/'train_v2.csv')

Also, I am running python 3.7.2 in ubuntu 18.04 on p2.xlarge instance on AWS


I have a doubt related to multi label classification for satellite image data.

How is the loss function computed here? Since there are multiple labels with 0 or 1 output, how loss takes into account for each label.


I run the Multi-label prediction with Planet Amazon dataset example (lesson3-planet.ipynb) on a medical imaging data (MRI) and got (too)good results
To get more confident at the results I try to adjust the tools of the interpolation from a lesson 2 as plotting the confusion matrix and images of the top_losses + prediction / actual / loss / probability

preds,y,losses = learn.get_preds(with_loss=True)
interp = ClassificationInterpretation(data, preds, y, losses)
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()

Thus, plotting my results in similar manner to that of single class labeling does not work

interp.plot_top_losses(9, figsize=(15,11))

and the same for

What should I do differently?
Thanks a lot

Multi labels is when an object can have several tags, Multi class is classification with multi classes, but the object can only have one label. Isn’t that right?

path_lbl = data/camvid/labels
path_img = data/camvid/images

I have an error message in get_y_fn

get_y_fn = lambda x: path_lbl/f{x.stem}_P{x.suffix}
mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)


Whats the problem?
Thanks a lot

That’s happening because path_lbl and f"{x.stem}_P{x.suffix}" are both strings, and you’re trying to do a division.

Two alternatives that could make this work:

lambda x: f"{path_lbl}{x.stem}_P{x.suffix}"


from pathlib import Path
path_lbl = Path(path_lbl)
lambda x: path_lbl / f"{x.stem}_P{x.suffix}"

Hi. Did anyone try the head pose notebook? I tried to run it and found out the training error and validation error would not go down after 5 epochs. They would actually go down at first two or three epochs then shoot up. I also tried different learning rate and unfreeze the model to fine-tune the model and train more epochs but it did not seem to solve the problem. My training error was way higher than the validation error. And those two errors were far higher than the error in the video.

BTW, the note seemed to have several changes comparing to the notebook in the video. There was PointsItemList rather than imagefilelist, and the error function was not changed in the notebook. I am assuming there was api updates so we have a specific class dealing with image to point,

I also meet the problem

Hi all, can anyone share the training times they’re getting and some platform information?

I’m using GCP with the basics from the getting started:
export IMAGE_FAMILY=“pytorch-latest-gpu”
export INSTANCE_TYPE=“n1-highmem-8”

gcloud compute instances create …



I’m a newbie to GPU coding, but nvidia-smi is showing 100% usage while training. But…it seems ‘slow’ (or my expectations need to be reconfigured).

For example, on the lesson 3 camvid problem, running this line:
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd).to_fp16()

takes about 15mins.
Each epoch takes about 1m40sec.

Are these expected times? If anyone could suggest how I could lower these times, that would be awesome.


1 Like

I am doing the planet example from less3-planet notebook. Just training the model with resnet50 and getting below error. Not sure if something changed in the API internally which is breaking it.

learn.fit_one_cycle(5, slice(lr))

0.00% [0/5 00:00<00:00]
epoch train_loss valid_loss accuracy_thresh fbeta


RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.fit_one_cycle(5, slice(lr))

~/.anaconda3/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
19 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
20 pct_start=pct_start, **kwargs))
—> 21 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
164 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
–> 166 callbacks=self.callbacks+callbacks)
168 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
92 except Exception as e:
93 exception = e
—> 94 raise e
95 finally: cb_handler.on_train_end(exception)

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
87 if hasattr(data,‘valid_dl’) and data.valid_dl is not None and data.valid_ds is not None:
88 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
—> 89 cb_handler=cb_handler, pbar=pbar)
90 else: val_loss=None
91 if cb_handler.on_epoch_end(val_loss): break

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
52 if not is_listy(yb): yb = [yb]
53 nums.append(yb[0].shape[0])
—> 54 if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
55 if n_batch and (len(nums)>=n_batch): break
56 nums = np.array(nums, dtype=np.float32)

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, loss)
237 “Handle end of processing one batch with loss.”
238 self.state_dict[‘last_loss’] = loss
–> 239 stop = np.any(self(‘batch_end’, not self.state_dict[‘train’]))
240 if self.state_dict[‘train’]:
241 self.state_dict[‘iteration’] += 1

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in call(self, cb_name, call_mets, **kwargs)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in (.0)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
272 if not is_listy(last_target): last_target=[last_target]
273 self.count += last_target[0].size(0)
–> 274 self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
276 def on_epoch_end(self, **kwargs):

~/.anaconda3/lib/python3.7/site-packages/fastai/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
20 “Compute accuracy when y_pred and y_true are the same size.”
21 if sigmoid: y_pred = y_pred.sigmoid()
—> 22 return ((y_pred>thresh)==y_true.byte()).float().mean()
24 def dice(input:FloatTensor, targs:LongTensor, iou:bool=False)->Rank0Tensor:

RuntimeError: The size of tensor a (418) must match the size of tensor b (64) at non-singleton dimension 1

1 Like


I am still having trouble with the reason we initially train ontop of a frozen network with precomputed weights,
then we unfreeze our network after some initial training. I am well into the third lesson and having trouble with this. See code snipped below. Thanks in advance.

  1. learn = create_cnn(data, models.resnet34, metrics=error_rate)
  2. learn.fit_one_cycle(4)
  3. learn.save(‘stage-1’)
  4. learn.unfreeze()

I’m having exactly the same issue. I’ve tried different transform and learner parameters with no luck.

At 43:21 in Lesson 3, Jeremy described a way how to update a model that misclassified some instances. I have several questions about it:

  1. He suggested to use fit_one_cycle() at a higher learning rate or longer epochs. So our saying that fixing prediction errors means forcing the model to overfit on the errors?
  2. I’m assuming that when we have a set of misclassified examples, we also split them into train and validation set and finetune on the training set until we get 100% accuracy on the validation set. Is this correct?
  3. How can be sure that finetuning only on the misclassified examples will not mess up the model’s predictions on the examples it already does well on?