Lesson 3 In-Class Discussion ✅

I’m not clear on something in the IMDB nb. @lesscomfortable I’m hoping you can explain it for me!

In the following part, is the second line of code (data_lm = TextLMDataBunch.load(path, ‘tmp_lm’, bs=bs)) changing data_lm in any way? I don’t believe it does, but am not sure.

data_lm = (TextList.from_folder(path)
           #Inputs: all the text files in path
            .filter_by_folder(include=['train', 'test', 'unsup']) 
           #We may have other temp folders that contain text files so we only keep what's in train and test
            .random_split_by_pct(0.1)
           #We randomly split and keep 10% (10,000 reviews) for validation
            .label_for_lm()           
           #We want to do a language model so we label accordingly
            .databunch(bs=bs))
data_lm.save('tmp_lm')

We have to use a special kind of TextDataBunch for the language model, that ignores the labels (that’s why we put 0 everywhere), will shuffle the texts at each epoch before concatenating them all together (only for training, we don’t shuffle for the validation set) and will send batches that read that text in order with targets that are the next word in the sentence.

The line before being a bit long, we want to load quickly the final ids by using the following cell.

data_lm = TextLMDataBunch.load(path, 'tmp_lm', bs=bs)

In the comment about “The line before being a bit long” I don’t know if “long” refers to execution time or the just the number of lines of code. That first line that creates data_lm runs fairly quickly, so I don’t really see what’s being gained by the second line that creates data_lm using TextLMDataBunch.

Hope this question makes sense! :slight_smile:

There was a question around 44:01 about the particular coding style for the DataBlocks API that uses that kind of method chaining.

In software engineering that’s called the “fluent interface”, you can find out more about it here: https://en.wikipedia.org/wiki/Fluent_interface

Another example that uses this kind of style is the Django QuerySet API.

Just thought that would be interesting to some of you.

Getting a bunch of errors with Lesson 3, I’m on the planet kaggle.

Showing df results in the following error:

AttributeError                            Traceback (most recent call last)
D:\Anaconda3\envs\fastai_v3\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

D:\Anaconda3\envs\fastai_v3\lib\site-packages\pandas\core\frame.py in _repr_html_(self)
    647         # display HTML, so this check can be removed when support for
    648         # IPython 2.x is no longer needed.
--> 649         if console.in_qtconsole():
    650             # 'HTML output is disabled in QtConsole'
    651             return None

D:\Anaconda3\envs\fastai_v3\lib\site-packages\pandas\io\formats\console.py in in_qtconsole()
    121             ip.config.get('KernelApp', {}).get('parent_appname', "") or
    122             ip.config.get('IPKernelApp', {}).get('parent_appname', ""))
--> 123         if 'qtconsole' in front_end.lower():
    124             return True
    125     except NameError:

AttributeError: 'LazyConfigValue' object has no attribute 'lower'

Then, in the next cell when you define src, after the np.random.seed(42) you get the following error that ends with:

Exception: Your validation data contains a label that isn't present in the training set, please fix your data.

Can anyone help?

I’ve also noticed many differences between the code in the 2019 MOOC and the current code in the github repo. Is that intentional? I tried running the exact commands on the MOOC video, but ImageFileList does not appear to be a valid command?

Anyone applied this on Nuclei dataset(https://www.kaggle.com/c/data-science-bowl-2018) using Mask-RCNN ?

I’m trying to train hand Segmentation for a new dataset (Egohands). However, I always get CUDA out of memory even with P100 (16Gb GPU). The error appear with even very small image size and batch size

size = src_size//32
bs=2

The original image shape is (1280,720)

Can someone suggest me how to deal with it ? Thank you in advance

This might be helpful

1 Like

Hey! The idea behind load is not to have write all that chunk of code and re-create the databunch object every time you want to use it. It is both a computation and code thing. You don’t want to do the same thing twice, if you can avoid it.

To your question, it is not changing data_lm in the same way that loading a model from saved weights does not change the Learner object. However, say you ran it yesterday, created the databunch object and want to run it again today. In that case, there will be nothing to overwrite, in other words no data_lm to change. You will use .load and that will load the same databunch you created yesterday with less computation time and less lines of code.

Please let me know if this solves your problem.

Hey guys! Can anyone advice if I can use lesson3-head-pose.ipynb notebook flow to find several faces on a picture? So each picture will be labeled with multidimensional tensor. Will it be possible if there will be a different amount of faces on each picture?

Solution in this post.

1 Like

@sgugger, I’m wrong or in the jupyter notebook lesson3-camvid.ipynb, data.show_batch() does not show the real input to our model U-Net (real input = image transformed by tfms)?

(same question about learn.show_results() that does not show what will be predicted by the model (ie, a mask) but an overlay of the input image and its mask prediction)

Indeed, data.show_batch() shows this:

But the true input is just a normal image transformed by tfms, no ? (and its targeted label is its mask, see following screenshot)

Last question: what is the code used by fastai to overlay 2 images (an image and its mask as displayed by data.show_batch() and learn.show_results()?

Thanks.

1 Like

Hey, put this before your code:
get_ipython().config.get('IPKernelApp', {})['parent_appname'] = ""

such that it reads:
get_ipython().config.get('IPKernelApp', {})['parent_appname'] = ""
df = pd.read_csv(path/'train_v2.csv')
df.head()

Also, I am running python 3.7.2 in ubuntu 18.04 on p2.xlarge instance on AWS

Hi,

I have a doubt related to multi label classification for satellite image data.

How is the loss function computed here? Since there are multiple labels with 0 or 1 output, how loss takes into account for each label.

Thanks

I run the Multi-label prediction with Planet Amazon dataset example (lesson3-planet.ipynb) on a medical imaging data (MRI) and got (too)good results
To get more confident at the results I try to adjust the tools of the interpolation from a lesson 2 as plotting the confusion matrix and images of the top_losses + prediction / actual / loss / probability

running:
preds,y,losses = learn.get_preds(with_loss=True)
interp = ClassificationInterpretation(data, preds, y, losses)
print(losses)
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()

Thus, plotting my results in similar manner to that of single class labeling does not work

len(data.valid_ds)==len(losses)==len(idxs)
interp.plot_top_losses(9, figsize=(15,11))

and the same for
interp.plot_confusion_matrix()

What should I do differently?
Thanks a lot
Moran

Multi labels is when an object can have several tags, Multi class is classification with multi classes, but the object can only have one label. Isn’t that right?

Given
path=data/camvid/
path_lbl = data/camvid/labels
path_img = data/camvid/images

I have an error message in get_y_fn

get_y_fn = lambda x: path_lbl/f{x.stem}_P{x.suffix}
mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha=1)

image

Whats the problem?
Thanks a lot
Moran

That’s happening because path_lbl and f"{x.stem}_P{x.suffix}" are both strings, and you’re trying to do a division.

Two alternatives that could make this work:

lambda x: f"{path_lbl}{x.stem}_P{x.suffix}"

or

from pathlib import Path
path_lbl = Path(path_lbl)
lambda x: path_lbl / f"{x.stem}_P{x.suffix}"

Hi. Did anyone try the head pose notebook? I tried to run it and found out the training error and validation error would not go down after 5 epochs. They would actually go down at first two or three epochs then shoot up. I also tried different learning rate and unfreeze the model to fine-tune the model and train more epochs but it did not seem to solve the problem. My training error was way higher than the validation error. And those two errors were far higher than the error in the video.

BTW, the note seemed to have several changes comparing to the notebook in the video. There was PointsItemList rather than imagefilelist, and the error function was not changed in the notebook. I am assuming there was api updates so we have a specific class dealing with image to point,

I also meet the problem

Hi all, can anyone share the training times they’re getting and some platform information?

I’m using GCP with the basics from the getting started:
#!/bin/bash
export IMAGE_FAMILY=“pytorch-latest-gpu”
export INSTANCE_TYPE=“n1-highmem-8”

gcloud compute instances create …
–image-project=deeplearning-platform-release
–accelerator=“type=nvidia-tesla-p4,count=1”

–boot-disk-size=200GB

–preemptible

I’m a newbie to GPU coding, but nvidia-smi is showing 100% usage while training. But…it seems ‘slow’ (or my expectations need to be reconfigured).

For example, on the lesson 3 camvid problem, running this line:
learn = unet_learner(data, models.resnet34, metrics=metrics, wd=wd).to_fp16()

takes about 15mins.
Each epoch takes about 1m40sec.

Are these expected times? If anyone could suggest how I could lower these times, that would be awesome.

cheers
Greg

1 Like

I am doing the planet example from less3-planet notebook. Just training the model with resnet50 and getting below error. Not sure if something changed in the API internally which is breaking it.

learn.fit_one_cycle(5, slice(lr))

0.00% [0/5 00:00<00:00]
epoch train_loss valid_loss accuracy_thresh fbeta

Interrupted

RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.fit_one_cycle(5, slice(lr))

~/.anaconda3/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
19 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
20 pct_start=pct_start, **kwargs))
—> 21 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
22
23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
164 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
–> 166 callbacks=self.callbacks+callbacks)
167
168 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
92 except Exception as e:
93 exception = e
—> 94 raise e
95 finally: cb_handler.on_train_end(exception)
96

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
87 if hasattr(data,‘valid_dl’) and data.valid_dl is not None and data.valid_ds is not None:
88 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
—> 89 cb_handler=cb_handler, pbar=pbar)
90 else: val_loss=None
91 if cb_handler.on_epoch_end(val_loss): break

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
52 if not is_listy(yb): yb = [yb]
53 nums.append(yb[0].shape[0])
—> 54 if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
55 if n_batch and (len(nums)>=n_batch): break
56 nums = np.array(nums, dtype=np.float32)

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, loss)
237 “Handle end of processing one batch with loss.”
238 self.state_dict[‘last_loss’] = loss
–> 239 stop = np.any(self(‘batch_end’, not self.state_dict[‘train’]))
240 if self.state_dict[‘train’]:
241 self.state_dict[‘iteration’] += 1

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in call(self, cb_name, call_mets, **kwargs)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
189

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in (.0)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
189

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
272 if not is_listy(last_target): last_target=[last_target]
273 self.count += last_target[0].size(0)
–> 274 self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
275
276 def on_epoch_end(self, **kwargs):

~/.anaconda3/lib/python3.7/site-packages/fastai/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
20 “Compute accuracy when y_pred and y_true are the same size.”
21 if sigmoid: y_pred = y_pred.sigmoid()
—> 22 return ((y_pred>thresh)==y_true.byte()).float().mean()
23
24 def dice(input:FloatTensor, targs:LongTensor, iou:bool=False)->Rank0Tensor:

RuntimeError: The size of tensor a (418) must match the size of tensor b (64) at non-singleton dimension 1

1 Like