Lesson 3 In-Class Discussion ✅

These are the most recent changes in code: https://github.com/fastai/fastai/blob/33aae8f7b4b7d323d943c178d9ba58afcf8f19b8/CHANGES.md#fixed

also, to update to the most recent nbs and fastai version, make sure run these in the terminal before you restart yr work:
cd courses/fast-ai/course-v3/
git pull
and
conda install -c pytorch -c fastai fastai pytorch (if you are using anaconda)

anyone else finding the time for one epoch in lesson 3- planet to be insanely SLOW??

How is it possible to initialize a U-Net with a ResNet when their architectures are completely different?

1 Like

What is causing these sudden drops in the training loss at each epoch if the learning rate is varying smoothly the whole time? Image is from training the unfrozen IMDB language model using
learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7))

EDIT: The error mysteriously disappeared today after I open the notebook again, not sure why.

Hi,

I am working on lesson3-planet notebook. I am trying to load the dataset from kaggle using:

df = pd.read_csv(path/'train_v2.csv')
df.head()

I got the following error:


TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/IPython/core/formatters.py in __call__(self, obj)
    697                 type_pprinters=self.type_printers,
    698                 deferred_pprinters=self.deferred_printers)
--> 699             printer.pretty(obj)
    700             printer.flush()
    701             return stream.getvalue()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in pretty(self, obj)
    396                             if callable(meth):
    397                                 return meth(obj, self, cycle)
--> 398             return _default_pprint(obj, self, cycle)
    399         finally:
    400             self.end_group()

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    516     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
    517         # A user-provided repr. Find newlines and replace them with p.break_()
--> 518         _repr_pprint(obj, p, cycle)
    519         return
    520     p.begin_group(1, '<')

/usr/local/lib/python3.6/dist-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    707     """A pprint that just redirects to the normal repr function."""
    708     # Find newlines and replace them with p.break_()
--> 709     output = repr(obj)
    710     for idx,output_line in enumerate(output.splitlines()):
    711         if idx:

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in __repr__(self)
     78         Yields Bytestring in Py2, Unicode String in py3.
     79         """
---> 80         return str(self)
     81 
     82 

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in __str__(self)
     57 
     58         if compat.PY3:
---> 59             return self.__unicode__()
     60         return self.__bytes__()
     61 

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __unicode__(self)
    634             width = None
    635         self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
--> 636                        line_width=width, show_dimensions=show_dimensions)
    637 
    638         return buf.getvalue()

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, line_width, max_rows, max_cols, show_dimensions)
   1673                                            max_cols=max_cols,
   1674                                            show_dimensions=show_dimensions)
-> 1675         formatter.to_string()
   1676 
   1677         if buf is None:

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in to_string(self)
    601             elif (not isinstance(self.max_cols, int) or
    602                     self.max_cols > 0):  # need to wrap around
--> 603                 text = self._join_multiline(*strcols)
    604             else:  # max_cols == 0. Try to fit frame to terminal
    605                 text = self.adj.adjoin(1, *strcols).split('\n')

/usr/local/lib/python3.6/dist-packages/pandas/io/formats/format.py in _join_multiline(self, *strcols)
    648             idx = strcols.pop(0)
    649             lwidth -= np.array([self.adj.len(x)
--> 650                                 for x in idx]).max() + adjoin_width
    651 
    652         col_widths = [np.array([self.adj.len(x) for x in col]).max() if

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims, initial)
     26 def _amax(a, axis=None, out=None, keepdims=False,
     27           initial=_NoValue):
---> 28     return umr_maximum(a, axis, None, out, keepdims, initial)
     29 
     30 def _amin(a, axis=None, out=None, keepdims=False,

TypeError: reduce() takes at most 5 arguments (6 given)

Apparently, it comes from trying to display the data frame. Any suggestion here?

1 Like

For some reason, its saying ImageList does not have attribute split.by.rand.pct

Anyone get a similar error?

You are using an older version of fastai. split_by_rand_pct is only available in 1.0.48.
Either update or use random_split_by_pct instead.

Cheers


so somehow I am running off the cpu and not the gpu. don’t know what happened, but everything gets updated when I start working. Any ideas how to get back to the gpu? I checked the terminal for activity and there are ‘No running processes found’

Yes. Same problem. Using Crestle. I don’t see a way to select CPU/GPU. Using terminal, there’s no Cuda installed in the environment. Is that the problem?

conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
or this if the gpu seems slow still
conda install pytorch torchvision cudatoolkit=9

I used this command in the jupyter terminal and now it’s not taking an hour to run one epoch!!

1 Like

me!!.. i keep getting this error. Not sure what’s going on. Were you able to resolve this?

The error disappeared but I dont know why. Probably you should update the notebook and related libraries, esp fastai.

I resolved it with this.

!pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
!pip install fastai

I am new, so not sure this is right way to share info, but I faced to same issue, then search the forum and get this comment at the first.
In my case, the cause of the issue was that the notebook did not import the module.
Put the from fastai.datasets import Config before running path = Config.data_path()/'planet, then everything worked fine.
Hope it helps other guys

3 Likes

I upgraded pandas lib to 0.24.2 ( it was 0.22.x ) and the error went away.

I guess this is the effect of the one cycle policy on each epoch.

You can call it a U-Resnet :grin: You will not exactly initialize with a resnet, you will create a new network that is very similar to Resnet in encoding pass, but saves the activations on some steps so that you can concat them with the input in the decode pass. There is a lesson in dl2 2018 about that. It is about segmentation on carvana dataset. Check the video.

In the head-pose regression problem, why was a size of 120 by 160 selected? Why these specific numbers and why is it a rectangle rather than a square?

Moreover, what exactly is the output of the model?
I tried to do:
pred = learner.predict(data.valid_ds.x[0])
The output was:
(ImagePoints (120, 160), tensor([[0.0895, 0.0582]]), tensor([0.0895, 0.0582]))
How do I interpret this? and why is the last item in this tuple equal to the second one?

I also tried to visualize this for a single image without using a the learner.show_results() method; however, I couldn’t do so.

I tried data.valid_ds.x[0].show(y=pred[0], figsize=(9, 9)) But the red point showed in some random spot of the image.
Also tried this and the result didn’t make sense either:
data.valid_ds.x[0].show(y=get_ip(data.valid_ds.x[0], pred[1]), figsize=(9, 9))

I have been running in out of memory for the planet lesson, but changed it to

data = (src.transform(tfms, size=256)
        .databunch(bs=32).normalize(imagenet_stats))

I have like 6GB of mem

|    0      4951      C   ...tyoc213/anaconda3/envs/swift/bin/python  5057MiB |

This is with 32, the questions are the batch size should be in powers of 2? and is possible to know before hand that I will get out of memory? or until I run and hit the problem? how many memory do I need for run this on the default size of 64?

I have problem with “to_fp16()”, in the class Howard said that this trick will help in case of running out of memory. What happened with me is the opposite, whenever I use fp16 I have to restart the notebook every time I run “fit_one_cycle” or it will show me an error message of not enough memory (collecting the garbage and setting the learner to False was not enough). Whereas, using the regular precision I can do “unfreeze” and “fit_one_cycle” indefinitely without requiring me to restart the notebook.
I am using Cuda version is 10.1.105 and 418.56 graphics card driver, OS is Ubuntu 16.04
I have Titan V with 12GB

1 Like

Prediction on a multiclass label.

It took me about an hour and a half. But i managed to finally figure out how to do a prediction on a single image with a multi class label. I’m not sure if it’ll work this way on all the classes but it definitely worked for me on the planet data set.

def pred_img(img_path, learn=learn, data=data, thresh=.2):
    img = open_image(img_path)
    c = data.c
    pred = learn.predict(img)[2] > thresh
    for i in range(c):
        if pred[i]==1:
            print(data.classes[i])
    img.show()

Here’s how it looks when I run it.
image

1 Like