Lesson 11 discussion and wiki

jcatanza · April 13, 2019, 5:54am

In the notebook 09c_add_progress_bar.ipynb, the line

learn.fit(2)
throws AttributeError: ‘NBMasterBar’ object has no attribute ‘update’

Anyone know how to fix this? I'm running Windows 10 64-bit. Thanks!

details below:

Total time: 00:00

epoch train_loss train_accuracy valid_loss valid_accuracy time

AttributeError Traceback (most recent call last)
in
----> 1 learn.fit(2)

~\fastai\fastai_docs\dev_course\dl2_jcat\exp\nb_09b.py in fit(self, epochs)
60 self.do_begin_fit(epochs)
61 for epoch in range(epochs):
—> 62 self.do_begin_epoch(epoch)
63 if not self(‘begin_epoch’): self.all_batches()
64

~\fastai\fastai_docs\dev_course\dl2_jcat\exp\nb_09b.py in do_begin_epoch(self, epoch)
54 def do_begin_epoch(self, epoch):
55 self.epoch,self.dl = epoch,self.data.train_dl
—> 56 return self(‘begin_epoch’)
57
58 def fit(self, epochs):

~\fastai\fastai_docs\dev_course\dl2_jcat\exp\nb_09b.py in call(self, cb_name)
79 res = False
80 assert cb_name in self.ALL_CBS
—> 81 for cb in sorted(self.cbs, key=lambda x: x._order): res = cb(cb_name) and res
82 return res
83

~\fastai\fastai_docs\dev_course\dl2_jcat\exp\nb_05b.py in call(self, cb_name)
19 def call(self, cb_name):
20 f = getattr(self, cb_name, None)
—> 21 if f and f(): return True
22 return False
23

in begin_epoch(self)
9 def after_fit(self): self.mbar.on_iter_end()
10 def after_batch(self): self.pb.update(self.iter)
—> 11 def begin_epoch (self): self.set_pb()
12 def begin_validate(self): self.set_pb()
13

in set_pb(self)
14 def set_pb(self):
15 self.pb = progress_bar(self.dl, parent=self.mbar, auto_update=False)
—> 16 self.mbar.update(self.epoch)

AttributeError: ‘NBMasterBar’ object has no attribute ‘update’

jeremy · April 13, 2019, 11:34am

See the top post re fastprogress.

jcatanza · April 13, 2019, 2:40pm

D’oh! Thanks Jeremy!

sergeman · April 13, 2019, 3:17pm

I shall not smile again. It makes me look like a fish

sergeman · April 13, 2019, 7:22pm

Lesson 11 notebooks with video link annotations available here

Kaspar · April 13, 2019, 8:36pm

Relevant question bécause de do normalisé at inférence time. one could argue that data that hasn’t been normaliséd By méan and std of the training data is a wast of time

olivier · April 14, 2019, 7:57am

Haha don’t blame me, blame the algorithm

On a serious note, I guess since we are using a classification model, the only thing the model can tell us is what it is or is not, an identification of an object; never someone is doing something, an identification of an action. For that we’d need some picture captioning models…

brismith · April 14, 2019, 9:29pm

It seems the worker threads are dying unless they are wrapped in a “if name == ‘main’:” The error below suggest I should be using fork, but that isn’t applicable to Windows as far as I am aware. I’m wondering if the ‘IS_WINDOWS’ part of https://github.com/pytorch/pytorch/blob/master/torch/utils/data/_utils/worker.py isn’t getting used - or has a bug. I’ll ask around at work to see if anyone knows a way around this.

I could repro and see the more complete error and also fix in a py program, but not sure how to get this running in Jupyter.
My program that runs ok was based on notebook 3 - as I could repro in the PyTorcj Dataloader section if I used num_workers > 0.:

from nb_03_bs import *
import torch.nn.functional as F
from torch.utils.data import DataLoader, SequentialSampler, RandomSampler
import numpy as np

class Dataset():
def init(self, x, y): self.x,self.y = x,y
def len(self): return len(self.x)
def getitem(self, i): return self.x[i],self.y[i]

def fit():
for epoch in range(epochs):
for xb,yb in train_dl:
pred = model(xb)
loss = loss_func(pred, yb)
loss.backward()
opt.step()
opt.zero_grad()

def get_model():
model = nn.Sequential(nn.Linear(m,nh), nn.ReLU(), nn.Linear(nh,10))
return model, optim.SGD(model.parameters(), lr=lr)

def collate(b):
xs,ys = zip(*b)
return torch.stack(xs),torch.stack(ys)

bs=64
mpl.rcParams[‘image.cmap’] = ‘gray’
x_train,y_train,x_valid,y_valid = get_data()

n,m = x_train.shape
c = y_train.max()+1
nh = 50
lr = 0.5
epochs = 1
loss_func = F.cross_entropy

train_ds,valid_ds = Dataset(x_train, y_train),Dataset(x_valid, y_valid)
assert len(train_ds)==len(x_train)
assert len(valid_ds)==len(x_valid)

if name == ‘main’:
train_dl = DataLoader(train_ds, bs, sampler=RandomSampler(train_ds), collate_fn=collate, num_workers=4)
valid_dl = DataLoader(valid_ds, bs, sampler=SequentialSampler(valid_ds), collate_fn=collate, num_workers=4)
xb,yb = next(iter(train_dl))
model,opt = get_model()
fit()
loss,acc = loss_func(model(xb), yb), accuracy(model(xb), yb)
assert acc>0.7
print(loss)
print(acc)

If I exclude the if name == ‘main’: the error I get is:

(fastai-partdeux) D:\OneDrive\AI\fastai_docs\dev_course\dl2\exp>cd d:\OneDrive\AI\fastai_docs\dev_course\dl2\exp && cmd /C "set “PYTHONIOENCODING=UTF-8” && set “PYTHONUNBUFFERED=1” && C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\python.exe c:\Users\bsmi0.vscode\extensions\ms-python.python-2019.3.6558\pythonFiles\ptvsd_launcher.py --default --client --host localhost --port 50869 d:\OneDrive\AI\fastai_docs\dev_course\dl2\exp\standalone.py "
Traceback (most recent call last):
File “”, line 1, in
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 105,
in spawn_main
exitcode = _main(fd)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 114,
in _main
prepare(preparation_data)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 225,
in prepare
_fixup_main_from_path(data[‘init_main_from_path’])
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 277,
in _fixup_main_from_path
run_name=“mp_main”)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “d:\OneDrive\AI\fastai_docs\dev_course\dl2\exp\standalone.py”, line 46, in
xb,yb = next(iter(train_dl))
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 162, in iter
return _DataLoaderIter(self)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 438, in init
w.start()
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\process.py”, line 112, in start
self._popen = self._Popen(self)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\context.py”, line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\context.py”, line 322, in _Popen
return Popen(process_obj)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\popen_spawn_win32.py”, line 46, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 143,
in get_preparation_data
_check_not_importing_main()
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\spawn.py”, line 136,
in _check_not_importing_main
is not going to be frozen to produce an executable.’’’)
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 480, in _try_get_batch
data = self.data_queue.get(timeout=timeout)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\multiprocessing\queues.py”, line 105, in get
raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “c:\Users\bsmi0.vscode\extensions\ms-python.python-2019.3.6558\pythonFiles\ptvsd_launcher.py”, line 45, in
main(ptvsdArgs)
File “c:\Users\bsmi0.vscode\extensions\ms-python.python-2019.3.6558\pythonFiles\lib\python\ptvsd_main_.py”, line 391, in main
run()
File “c:\Users\bsmi0.vscode\extensions\ms-python.python-2019.3.6558\pythonFiles\lib\python\ptvsd_main_.py”, line 272, in run_file
runpy.run_path(target, run_name=‘main’)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “d:\OneDrive\AI\fastai_docs\dev_course\dl2\exp\standalone.py”, line 46, in
xb,yb = next(iter(train_dl))
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 545, in next
idx, batch = self._get_batch()
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 522, in _get_batch
success, data = self._try_get_batch()
File “C:\Users\bsmi0\Anaconda3\envs\fastai-partdeux\lib\site-packages\torch\utils\data\dataloader.py”, line 488, in _try_get_batch
raise RuntimeError(‘DataLoader worker (pid(s) {}) exited unexpectedly’.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 30832) exited unexpectedly

karthik.subraveti · April 15, 2019, 12:01am

How can we handle a case where i have a dataset where a single training item consists of multiple heat maps of the same object but from different perspectives ? Do i use each perspective as a separate training item or is there some other technique for dealing with this ?

Kaspar · April 15, 2019, 8:47am

Could the perspectives be aligned and layered AS channels ?

digitalspecialists · April 15, 2019, 9:16am

Impossible to tell without a lot more detail of the dataset.

If you take the carvana image segmentation challenge (lesson 14 last year) there was enough information to not use the information that there were many images of the same car from different angles.

On the other hand, you can look at the solutions to kaggle’s landmark retrieval task, or even the whale task, and see that because of low sample size it’s best to take a metric/distance/fewshot learning approach.

karthik.subraveti · April 15, 2019, 11:30am

In the example you suggest say if we have front view and back view of each car provided, do we create a classifier for front view of the car and another for the backview and finally merge them through some weighted mechanism?

a_yasyrev · April 15, 2019, 3:21pm

DataBlock question.
Why do we labelling after splitting train / val?
In case of rare clases it is possible what whole rare class (or classes) will be in validation, so we miss them when create vocal from train.
If we first label dataset, we then create full label list. And then in case of rare classes we can split every rare class personally.

fabris · April 15, 2019, 4:22pm

Because the labeling function was though as a tool to retrieve the label and convert it into the appropriate type used by your model.

The ratio of doing the right split is in the function you use to divide your data which is done before: there you can implement the criterions that best fit your classes so as to avoid unbalanced training set and validation set.

RawanSaifAldeen · April 15, 2019, 4:25pm

I’ve been having this error whenever I try to download a dataset from amazonaws using untar_data():

SSLError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/imagenette-160.tgz (Caused by SSLError(SSLError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)’),))

I think this has something to do with my location (I am currently in KSA), is there a known solution for this problem? any argument to change the SSL connection region?

stas · April 15, 2019, 4:48pm

Since you’re not using the specific zone, but s3, it should redirect you to the right region - so most likely it’s your setup that is problematic.

Have you tried googling it, quite a few reports are out there. e.g. one that caught my attention is to pip install -U pyopenssl.

The simplest way to rule out your local python setup is to use your browser instead of untar_data, i.e. download directly: https://s3.amazonaws.com/fast-ai-imageclas/imagenette-160.tgz

sgugger · April 15, 2019, 4:49pm

Jeremy explained it during the lesson: you need to apply your processors on the training set then on the validation set, which is why you split first.
Splitting every rare class personally requires you to write a custom function to split them in any case.

RawanSaifAldeen · April 15, 2019, 5:11pm

I googled it but I couldn’t figure out how to fix it through untar_data(), I did try pip install pyopenssl, awscli and s3transfer but the problem still exists
Yes, I decided to just download it manually.

stas · April 15, 2019, 5:32pm

If it works in your browser, then the issue lies with either requests python library or something is invalid in your local setup - e.g. if you use a proxy perhaps your HTTPS_PROXY env var is not set up, so your browser has it set and it works, but your python can’t see it, so you need to set it up if you use a proxy.

But otherwise you can see that it’s just a plain requests call: https://github.com/fastai/fastai/blob/master/fastai/core.py#L170 So you can now isolate it, so it has nothing to do with fastai and then debug it in isolation and once you isolate a small reproducible script that fails (w/o fastai that is) - you may choose to ask for help at https://github.com/kennethreitz/requests/issues/.

stas · April 15, 2019, 5:48pm

Is there some way to avoid 2 user configurable entry points for lr? Currently in nb9 we have:

sched = combine_scheds([0.3, 0.7], [sched_cos(0.3, 0.6), sched_cos(0.6, 0.2)]) 
^^^^^
cbfs = [partial(AvgStatsCallback,accuracy),
        CudaCallback, Recorder,
        partial(ParamScheduler, 'lr', sched)]
learn,run = get_learn_run(nfs, data, 0.4, conv_layer, cbs=cbfs, opt_func=opt_func)
                                     ^^^

I notice that I often forget about the sched and try to tweak the lr arg in get_learn_run instead, which gets completely ignored when sched cb is used. Any design ideas on how this can be avoided and have only one place where lr can be tweaked? Perhaps tell the user to use lr=None in get_learn_run arg so that the user knows for sure that this argument is not being used and she is to change lr setttings elsewhere?

I guess the simplest solution would be to just remove the lr arg from that call and rely on default instead, and will have to do something like:

learn,run = get_learn_run(nfs, data, layer=conv_layer, cbs=cbfs, opt_func=opt_func)

but in the lesson notebook lr is a required argument and has no default.

This seems to work:

learn,run = get_learn_run(nfs, data, lr=None, layer=conv_layer, cbs=cbfs, opt_func=opt_func)

Perhaps lr should be either a set of numbers, or a callback? And then we have:

learn,run = get_learn_run(nfs, data, lr=partial(ParamScheduler, 'lr', sched)], ...

Probably more like:

learn,run = get_learn_run(nfs, data, lr=sched, ...

and let fastai insert that callback on its own?

I don’t care so much about doing anything about it in the lesson code, but wanting to have this less error-prone for the post-part2 incarnation of fastai.