Lesson 2: further discussion ✅

Quote from https://forums.fast.ai/t/lesson-2-further-discussion/28706/62:

Hello @zachcaceres. Any plan to adapt ImageCleaner() to all ImageBunch methods and not only from_folder() ?

I’m using the from_name_re() method and the following code gives back the training images, not the validation ones (even with ds_type=DatasetType.Valid):

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)
ImageCleaner(ds, idxs)

hello @pierreguillou!

it’s unlikely that I would do adaptation of the widget for at least a month. Apologies, my plate is just too full until then.

I’m sure that Jeremy and Sylvain would welcome PRs that extend the widget and I also know that @lesscomfortable is familiar with the inner-workings and might be able to help.

I am getting following error from function DatasetFormatter().from_similars
Any idea how to get weight file ?

1 Like

tanismar [2:59 PM]
Hi there! This should be super-simple, but I can’t seem to find a way to do it. I want to implement a single image classifier (based on https://github.com/fastai/fastai_docs/blob/master/dev_nb/104c_single_image_pred.ipynb) to discriminate among the ImageNet classes. Ideally, I would just have to load the trained model (say, ResNet34), no transfer learning, no fine-tuning. However, ImageNet is not itself provided as a fast.ai Dataset (https://course.fast.ai/datasets), so I can’t figure out how to set my ‘data’ argument to pass to create_cnn() so that it keeps all the original classes. Has anyone tried this, or has some idea how to do it? Thanks in advance!

Hello guys, I am having troube running the new ImageCleaner widget. The error that it produces is:

TypeError: slice indices must be integers or None or have an index method

I have attached the code I am running plus the full error trace. Anyone has any idea?

Summary

ds, idxs = DatasetFormatter().from_toplosses(learn)

ImageCleaner(ds, idxs,path)


TypeError Traceback (most recent call last)
in ()
1 ds, idxs = DatasetFormatter().from_toplosses(learn)
----> 2 ImageCleaner(ds, idxs,path)

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in init(self, dataset, fns_idxs, batch_size, duplicates, start, end)
92 self._deleted_fns = []
93 self._skipped = 0
—> 94 self.render()
95
96 @classmethod

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in render(self)
220 self._skipped += 1
221 else:
–> 222 display(self.make_horizontal_box(self.get_widgets(self._duplicates)))
223 display(self.make_button_widget(‘Next Batch’, handler=self.next_batch, style=“primary”))

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in get_widgets(self, duplicates)
180 “Create and format widget set.”
181 widgets = []
–> 182 for (img,fp,human_readable_label) in self._all_images[:self._batch_size]:
183 img_widget = self.make_img_widget(img, layout=Layout(height=‘250px’, width=‘300px’))
184 dropdown = self.make_dropdown_widget(description=’’, options=self._labels, value=human_readable_label,

TypeError: slice indices must be integers or None or have an index method

7 Likes

On the Resnet34 question (1:12 - https://youtu.be/ccMHJeQU4Qw?t=4332), @jeremy said you can set pretrained=False in the learner definition. Is this really true? I thoughts the model.resnet34 has weights to start from. I actually set the flag to false and got high error rate (20% vs 3%). Any update on this?

In Pytorch, Would a loss function like below work: ?

def my_loss_func(y_hat, y):
   cnt = 0  
   for idx, val in enumerate(y) :
         if val != -1 : 
           s = s + val  - y_hat[idx]
           cnt  = cnt + 1 
   return s/cnt

Basically I want to take into account losses for only those values where the real answer is not equal to -1 (a value I fill the missing values with).
If not, any ideas on the correct way to approach this?

The only error in your code is you need to initialize s = 0 as the first line in your function.

But I’d be wary of taking the sum of differences as the loss function, because positive and negative differences tend to cancel each other, and could give you a low value of the loss function even when the predictions don’t actually agree with the data.

For this reason, I would use the rms (root mean squared) error (or mean absolute error) instead. I’ve implemented rms error below:

# compute the rms error of the model for the selected targets
def my_loss_func(y_hat, y):
      # boolean indicator selects values for which target is not equal to -1
      idx = val != -1
      # compute and return the rms error between target and predicted target, for the selected values:
      error = y[idx] - y_hat[idx]     # error
      s = np.sqrt( np.mean( np.dot(error, error) ) )     # rms error
      return s
2 Likes

Yes, I had planned on using the rmse error, just wasn’t sure if the conditional would work. Thank you for providing the complete loss function.

Hey! I’m working on the lesson2-download notebook and, when running ImageCleaner(ds, idxs, path) I get an error message that says “Runtime disconnected” and my notebook freezes. Do you have any idea of what might be going on?

I’m running it on Colab. Could it be that it’s running out of memory or something like that?
Here you can see some info on my original dataset and the one that DatasetFormatter.from_toplosses() generates:

Thank you so much for your help!

3 Likes

At around 49:00 in the Lesson 2 video, @jeremy says that training loss being higher than validation loss means that you’re either training too slowly, or haven’t trained for enough epochs. However, directly above this in the notebook, looking at the output from the default learning rate setting (which I presume was pretty good, given the low error?), the training loss is indeed about 6x the validation loss.

So – this makes me wonder: Was the learning rate in the model output copied and pasted above too low? Or is there some train/validation loss ratio threshold that should alert us to a low learning rate?

Also relevant to the question raised by @atlascivan a few months ago in this thread – I had always understood that training loss should pretty much always be lower than validation loss, precisely because the training data are what are used to build the model, so the model should pretty much always do better on those than novel data. I’m not thinking about deep learning specifically, but ML in general.

I think what Jeremy meant (please try not to @ him or Rachel unless absolutely necessary) was that at the end of your training the training loss should be lower than the validation loss. With various regularisation techniques we are purposely making the training classification harder, so that the model generalises better. This explains why you are seeing higher training losses than validation ones (at least at the beginning).

In my (admittedly limited) experience I am happy when the training loss becomes less than the validation one only a few epochs before the networks starts overfitting (that is, the validation loss start increasing).

Hey, guys. In Lesson 2 (41:00) Jeremy shows how model can predict and uses a picture of a Black Bear from the dataset. I have some questions if you don’t mind:

  1. If that was a picture from the dataset, there’s an 80% chance it was in a training part of the dataset, so the model might have seen it already and just remembers it’s a Black Bear (overfit?). Is it important to make sure the model didn’t see a picture we’re trying to predict in order to check if can predict properly?
  2. If it’s important, how do I make sure the model didn’t see this picture? How can I see if the picture is in a training set or in a validation set?
  3. (Not related)How can I know the confidence score when predicting a class?

Validation set (even if extracted from the existing set) is not included in training. It is checked as if it were a test set. If I recall, an image was picked just to show how prediction works - this image could be something you upload but was picked from existing folder for convenience. There is one possibility in datasets created this way (download images from Google) of duplicates, so need to check for that.
You can see what goes into train, valid and test sets using data.train_ds … etc.
and when you call predict, you get a tensor of probabilities for each class.

Hi,

In my jupyter lab, the function

ImageCleaner

is not rendering any widget. I running the jupyter lab on Google Cloud.
Am I missing anything?

Please find the screenshot for more info.

Below are the pip packages that are installed

alabaster==0.7.11
anaconda-client==1.7.2
anaconda-navigator==1.9.2
anaconda-project==0.8.2
appdirs==1.4.3
asn1crypto==0.24.0
astroid==2.0.4
astropy==3.0.4
atomicwrites==1.2.1
attrs==18.2.0
Automat==0.7.0
Babel==2.6.0
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
bcolz==1.2.1
beautifulsoup4==4.6.3
bitarray==0.8.3
bkcharts==0.2
blaze==0.11.3
bleach==2.1.4
bokeh==0.13.0
boto==2.49.0
Bottleneck==1.2.1
cachetools==3.0.0
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
click==6.7
cloudpickle==0.5.5
clyent==1.2.2
colorama==0.3.9
conda==4.6.1
conda-build==3.15.1
constantly==15.1.0
contextlib2==0.5.5
cryptography==2.4.2
cycler==0.10.0
cymem==2.0.2
Cython==0.28.5
cytoolz==0.9.0.1
dask==0.19.1
dataclasses==0.6
datashape==0.5.4
decorator==4.3.0
defusedxml==0.5.0
dill==0.2.8.2
distributed==1.23.1
docutils==0.14
entrypoints==0.2.3
enum34==1.1.6
et-xmlfile==1.0.1
fastai==1.0.42
fastcache==1.0.2
fastprogress==0.1.18
filelock==3.0.8
Flask==1.0.2
Flask-Cors==3.0.6
gevent==1.3.6
glob2==0.6
gmpy2==2.0.8
google-api-core==1.7.0
google-api-python-client==1.7.7
google-auth==1.6.2
google-auth-httplib2==0.0.3
google-cloud-bigquery==1.8.1
google-cloud-core==0.29.1
google-cloud-dataproc==0.3.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.6
greenlet==0.4.15
h5py==2.8.0
heapdict==1.0.0
html5lib==1.0.1
httplib2==0.12.0
hyperlink==18.0.0
idna==2.7
imageio==2.4.1
imagesize==1.1.0
incremental==17.5.0
ipykernel==5.1.0
ipython==7.2.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==0.24
jdcal==1.4
jedi==0.13.2
jeepney==0.3.1
Jinja2==2.10
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-contrib-core==0.3.3
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.4.0
jupyter-highlight-selected-word==0.2.0
jupyter-http-over-ws==0.0.2
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.4.1
jupyterlab==0.35.4
jupyterlab-git==0.5.0
jupyterlab-launcher==0.13.1
jupyterlab-server==0.2.0
kaggle==1.5.1.1
keyring==13.2.1
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.24.0
locket==0.2.0
lxml==4.2.5
MarkupSafe==1.0
matplotlib==2.2.3
mccabe==0.6.1
mistune==0.8.3
mkl-fft==1.0.10
mkl-random==1.0.2
more-itertools==4.3.0
mpmath==1.0.0
msgpack==0.5.6
msgpack-numpy==0.4.3.2
multipledispatch==0.6.0
murmurhash==1.0.1
navigator-updater==0.2.1
nb-conda==2.2.1
nb-conda-kernels==2.2.0
nbconvert==5.4.0
nbformat==4.4.0
nbpresent==3.0.2
networkx==2.1
nltk==3.3
nose==1.3.7
notebook==5.6.0
numba==0.39.0
numexpr==2.6.9
numpy==1.15.4
numpydoc==0.8.0
odo==0.5.1
olefile==0.46
opencv-python==4.0.0.21
openpyxl==2.5.6
packaging==17.1
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
partd==0.3.8
path.py==11.1.0
pathlib2==2.3.2
patsy==0.5.0
pep8==1.7.1
pexpect==4.6.0
pickleshare==0.7.4
Pillow==5.2.0
Pillow-SIMD==5.3.0.post0
pkginfo==1.4.2
plac==0.9.6
pluggy==0.7.1
ply==3.11
preshed==2.0.1
prettytable==0.7.2
prometheus-client==0.3.1
prompt-toolkit==2.0.7
protobuf==3.6.1
psutil==5.4.7
ptyprocess==0.6.0
py==1.6.0
pyasn1==0.4.5
pyasn1-modules==0.2.3
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.18
pycrypto==2.6.1
pycurl==7.43.0.2
pyflakes==2.0.0
Pygments==2.3.1
pylint==2.1.1
pyodbc==4.0.24
pyOpenSSL==18.0.0
pyparsing==2.2.0
PySocks==1.6.8
pytest==3.8.0
pytest-arraydiff==0.2
pytest-astropy==0.4.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-remotedata==0.3.0
python-dateutil==2.7.5
python-slugify==2.0.1
pytz==2018.5
PyWavelets==1.0.0
PyYAML==3.13
pyzmq==17.1.2
QtAwesome==0.4.4
qtconsole==4.4.1
QtPy==1.5.0
regex==2018.1.10
requests==2.19.1
rope==0.11.0
rsa==4.0
ruamel-yaml==0.15.46
scikit-image==0.14.0
scikit-learn==0.19.2
scipy==1.1.0
seaborn==0.9.0
SecretStorage==3.1.0
Send2Trash==1.5.0
service-identity==17.0.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
snowballstemmer==1.2.1
sortedcollections==1.0.1
sortedcontainers==2.0.5
spacy==2.0.18
Sphinx==1.7.9
sphinxcontrib-websupport==1.1.0
spyder==3.3.1
spyder-kernels==0.2.6
SQLAlchemy==1.2.11
sqlparse==0.2.4
statsmodels==0.9.0
sympy==1.2
tables==3.4.4
tblib==1.3.2
terminado==0.8.1
testpath==0.3.1
thinc==6.12.1
toolz==0.9.0
torch==1.0.0
torchvision==0.2.1
tornado==5.1.1
tqdm==4.26.0
traitlets==4.3.2
Twisted==18.7.0
typing==3.6.4
ujson==1.35
unicodecsv==0.14.1
Unidecode==1.0.23
uritemplate==3.0.0
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.14.1
widgetsnbextension==3.4.1
wrapt==1.10.11
xlrd==1.1.0
XlsxWriter==1.1.0
xlwt==1.3.0
zict==0.1.3
zope.interface==4.5.0

OS Name: Debian GNU/Linux 9 (stretch)

Hi, I was also having the problem of ImageCleaner not rendering the widget.

I’m on Paperspace Gradient.

What solved the problem for me was “File -> Trust Notebook”, then reload the notebook.

I tried the same now. It did not work for me.

Hi! I have the same problem. Did you find a solution?

After looking around a bit, I found a solution: just make sure you are working the latest version of fastai. This is what you have to do:

Open the terminal and check what is your version of fastai: pip show fastai
Should be version 10.4.2, if you have an earlier one, type: pip install fastai --upgrade

You have to restart the kernel and re-run the whole thing for the changes to have effect.

2 Likes

I had the same problem reoccur in a notebook I trusted. Once you run the widget once and get the error message, if you refresh the notebook (just refresh the browser) it may work after that.

2 Likes