Lesson 3 In-Class Discussion ✅

I am doing the planet example from less3-planet notebook. Just training the model with resnet50 and getting below error. Not sure if something changed in the API internally which is breaking it.

learn.fit_one_cycle(5, slice(lr))

0.00% [0/5 00:00<00:00]
epoch train_loss valid_loss accuracy_thresh fbeta

Interrupted

RuntimeError Traceback (most recent call last)
in ()
----> 1 learn.fit_one_cycle(5, slice(lr))

~/.anaconda3/lib/python3.7/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
19 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
20 pct_start=pct_start, **kwargs))
—> 21 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
22
23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
164 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
–> 166 callbacks=self.callbacks+callbacks)
167
168 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
92 except Exception as e:
93 exception = e
—> 94 raise e
95 finally: cb_handler.on_train_end(exception)
96

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
87 if hasattr(data,‘valid_dl’) and data.valid_dl is not None and data.valid_ds is not None:
88 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
—> 89 cb_handler=cb_handler, pbar=pbar)
90 else: val_loss=None
91 if cb_handler.on_epoch_end(val_loss): break

~/.anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
52 if not is_listy(yb): yb = [yb]
53 nums.append(yb[0].shape[0])
—> 54 if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
55 if n_batch and (len(nums)>=n_batch): break
56 nums = np.array(nums, dtype=np.float32)

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, loss)
237 “Handle end of processing one batch with loss.”
238 self.state_dict[‘last_loss’] = loss
–> 239 stop = np.any(self(‘batch_end’, not self.state_dict[‘train’]))
240 if self.state_dict[‘train’]:
241 self.state_dict[‘iteration’] += 1

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in call(self, cb_name, call_mets, **kwargs)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
189

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in (.0)
185 def call(self, cb_name, call_mets=True, **kwargs)->None:
186 “Call through to all of the CallbakHandler functions.”
–> 187 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
188 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
189

~/.anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_end(self, last_output, last_target, **kwargs)
272 if not is_listy(last_target): last_target=[last_target]
273 self.count += last_target[0].size(0)
–> 274 self.val += last_target[0].size(0) * self.func(last_output, *last_target).detach().cpu()
275
276 def on_epoch_end(self, **kwargs):

~/.anaconda3/lib/python3.7/site-packages/fastai/metrics.py in accuracy_thresh(y_pred, y_true, thresh, sigmoid)
20 “Compute accuracy when y_pred and y_true are the same size.”
21 if sigmoid: y_pred = y_pred.sigmoid()
—> 22 return ((y_pred>thresh)==y_true.byte()).float().mean()
23
24 def dice(input:FloatTensor, targs:LongTensor, iou:bool=False)->Rank0Tensor:

RuntimeError: The size of tensor a (418) must match the size of tensor b (64) at non-singleton dimension 1

1 Like

Hi,

I am still having trouble with the reason we initially train ontop of a frozen network with precomputed weights,
then we unfreeze our network after some initial training. I am well into the third lesson and having trouble with this. See code snipped below. Thanks in advance.

  1. learn = create_cnn(data, models.resnet34, metrics=error_rate)
  2. learn.fit_one_cycle(4)
  3. learn.save(‘stage-1’)
  4. learn.unfreeze()
    5.learn.fit_one_cycle(4)

I’m having exactly the same issue. I’ve tried different transform and learner parameters with no luck.

At 43:21 in Lesson 3, Jeremy described a way how to update a model that misclassified some instances. I have several questions about it:

  1. He suggested to use fit_one_cycle() at a higher learning rate or longer epochs. So our saying that fixing prediction errors means forcing the model to overfit on the errors?
  2. I’m assuming that when we have a set of misclassified examples, we also split them into train and validation set and finetune on the training set until we get 100% accuracy on the validation set. Is this correct?
  3. How can be sure that finetuning only on the misclassified examples will not mess up the model’s predictions on the examples it already does well on?

I would check the data and make sure it all downloaded properly. There should be a folder train-jpg and a csv train_v2.csv both with 40,478 entries.

I got the same issue and just solved it out by updated fastai to the latest version.
You can check your label_from_df from data block API [source] that trying to split the labels of the image. It seems like your data.c is 418, however it should be 17.

Did you come up with a novel idea for the “homework”? If so, how do you create your training set?

Has anyone tapped in to wiki data yet?

I’m currently working through the IMDB example and have decided to skip creating a language model for text prediction since it is so time intensive and move straight to language classification. I notice that for this we load the encoder that was saved previously with learn.load_encoder('fine_tuned_enc'). I was able to run learn.save_encoder('fine_tuned_enc') without actually fitting and/or saving the model. I see that the encoder is “the part that’s responsible for creating and updating the hidden state” but at what point was this computed and how is this done separately from computing the model?

In the latest version (as of today, commit id: 8671cd7a) of the notebook “lesson3-camvid.ipynb”, after training and saving for “stage-1”, it unfreeze and immediately fit_one_cycle with learning slice(lr/400,lr/4) . I noticed this is different from what was talked about in the course, which did lr_find(learn) , learn.recorder.plot() and then choose the lr from the plot to fit_one_cycle() . I’m curious, what is reason for this change?

1 Like

These are the most recent changes in code: https://github.com/fastai/fastai/blob/33aae8f7b4b7d323d943c178d9ba58afcf8f19b8/CHANGES.md#fixed

also, to update to the most recent nbs and fastai version, make sure run these in the terminal before you restart yr work:
cd courses/fast-ai/course-v3/
git pull
and
conda install -c pytorch -c fastai fastai pytorch (if you are using anaconda)

anyone else finding the time for one epoch in lesson 3- planet to be insanely SLOW??

How is it possible to initialize a U-Net with a ResNet when their architectures are completely different?

1 Like

What is causing these sudden drops in the training loss at each epoch if the learning rate is varying smoothly the whole time? Image is from training the unfrozen IMDB language model using
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))

EDIT: The error mysteriously disappeared today after I open the notebook again, not sure why.

Hi,

I am working on lesson3-planet notebook. I am trying to load the dataset from kaggle using:

df = pd.read_csv(path/'train_v2.csv')
df.head()

I got the following error:


TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    697                 type_pprinters=self.type_printers,
    698                 deferred_pprinters=self.deferred_printers)
--> 699             printer.pretty(obj)
    700             printer.flush()
    701             return stream.getvalue()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
    396                             if callable(meth):
    397                                 return meth(obj, self, cycle)
--> 398             return _default_pprint(obj, self, cycle)
    399         finally:
    400             self.end_group()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    516     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    517         # A user-provided repr. Find newlines and replace them with p.break_()
--> 518         _repr_pprint(obj, p, cycle)
    519         return
    520     p.begin_group(1, '<')

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    707     """A pprint that just redirects to the normal repr function."""
    708     # Find newlines and replace them with p.break_()
--> 709     output = repr(obj)
    710     for idx,output_line in enumerate(output.splitlines()):
    711         if idx:

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in __repr__(self)
     78         Yields Bytestring in Py2, Unicode String in py3.
     79         """
---> 80         return str(self)
     81 
     82 

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in __str__(self)
     57 
     58         if compat.PY3:
---> 59             return self.__unicode__()
     60         return self.__bytes__()
     61 

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __unicode__(self)
    634             width = None
    635         self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
--> 636                        line_width=width, show_dimensions=show_dimensions)
    637 
    638         return buf.getvalue()

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, line_width, max_rows, max_cols, show_dimensions)
   1673                                            max_cols=max_cols,
   1674                                            show_dimensions=show_dimensions)
-> 1675         formatter.to_string()
   1676 
   1677         if buf is None:

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in to_string(self)
    601             elif (not isinstance(self.max_cols, int) or
    602                     self.max_cols > 0):  # need to wrap around
--> 603                 text = self._join_multiline(*strcols)
    604             else:  # max_cols == 0. Try to fit frame to terminal
    605                 text = self.adj.adjoin(1, *strcols).split('\n')

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _join_multiline(self, *strcols)
    648             idx = strcols.pop(0)
    649             lwidth -= np.array([self.adj.len(x)
--> 650                                 for x in idx]).max() + adjoin_width
    651 
    652         col_widths = [np.array([self.adj.len(x) for x in col]).max() if

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims, initial)
     26 def _amax(a, axis=None, out=None, keepdims=False,
     27           initial=_NoValue):
---> 28     return umr_maximum(a, axis, None, out, keepdims, initial)
     29 
     30 def _amin(a, axis=None, out=None, keepdims=False,

TypeError: reduce() takes at most 5 arguments (6 given)

Apparently, it comes from trying to display the data frame. Any suggestion here?

1 Like

For some reason, its saying ImageList does not have attribute split.by.rand.pct

Anyone get a similar error?

You are using an older version of fastai. split_by_rand_pct is only available in 1.0.48.
Either update or use random_split_by_pct instead.

Cheers


so somehow I am running off the cpu and not the gpu. don’t know what happened, but everything gets updated when I start working. Any ideas how to get back to the gpu? I checked the terminal for activity and there are ‘No running processes found’

Yes. Same problem. Using Crestle. I don’t see a way to select CPU/GPU. Using terminal, there’s no Cuda installed in the environment. Is that the problem?

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
or this if the gpu seems slow still
conda install pytorch torchvision cudatoolkit=9

I used this command in the jupyter terminal and now it’s not taking an hour to run one epoch!!

1 Like

me!!.. i keep getting this error. Not sure what’s going on. Were you able to resolve this?

The error disappeared but I dont know why. Probably you should update the notebook and related libraries, esp fastai.