Lesson 2: further discussion ✅

Hi All - I am a newbie here and am aware of the limitations of running the ImageCleaner on the JupyterLab code base. ImageCleaner appears to be an absolutely fantastic tool and I would love to run it in Jupyter Notebooks (proper).

However, I am running the Fastai library on the OnePanel platform to gain access to a GPU, and OnePanel seems to be firmly married to the JupyterLab code base.

I could not find an obvious way to launch a notebook session with the older widget-compatible “Jupiter Notebook” code that allegedly does support widgets properly and, hence, ImageCleaner.

I imagine mine is likely a rather naive question, but I have been trying to figure this out on my own, both within the Fastai forum and via StackOverflow and have hit a rather considerable wall.

Any help would be greatly appreciated.

After the training part I added these these lines to my code and I got the below error can someone please tell me what’s the issue here

learn.save(‘stage-2’)
learn_cln.load(‘stage-2’)
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
ImageCleaner(ds, idxs, path)


RuntimeError Traceback (most recent call last)
in
----> 1 learn_cln.load(‘stage-2’)
2 ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
3 ImageCleaner(ds, idxs, path)

~/anaconda3/envs/object_rec/lib/python3.7/site-packages/fastai/basic_train.py in load(self, file, device, strict, with_opt, purge, remove_module)
271 model_state = state[‘model’]
272 if remove_module: model_state = remove_module_load(model_state)
–> 273 get_model(self.model).load_state_dict(model_state, strict=strict)
274 if ifnone(with_opt,True):
275 if not hasattr(self, ‘opt’): self.create_opt(defaults.lr, self.wd)

~/anaconda3/envs/object_rec/lib/python3.7/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
828 if len(error_msgs) > 0:
829 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
–> 830 self.class.name, “\n\t”.join(error_msgs)))
831 return _IncompatibleKeys(missing_keys, unexpected_keys)
832

RuntimeError: Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([10, 512]) from checkpoint, the shape in current model is torch.Size([11, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([11]).

How do you solve the problem when your downoad the url files?

I was trying to download “teddy bear” urls into file in chome. But it turned out that there is nothing in the file downloaded.

This didn’t work for me in Chrome too so I used Firefox. I believe had to change something in the about:config but I can’t remember why or what.

They removed FileDeleter. It’s now called ImageCleaner. But this widget doesn’t work in Colab. There is an alternative widget: The File_Deleter Widget

You can just install it with !git clone https://github.com/muellerzr/ClassConfusion.git in Colab.

But it’s also suboptimal because sometimes it works and sometimes it doesn’t show anything. :slight_smile:

2 Likes

Any chance you could provide an example of it not showing anything? I’ll gladly look into it :slight_smile: (or put an issue into the GitHub too :wink: )

Right now I moved on to image segmentation but I will put an issue into the GitHub if I find an example. And thanks for building this! :slight_smile:

On some paper (update: https://arxiv.org/pdf/1803.09820 ) I saw a picture that explained it very well. I’ll paste some here that provide the same message.

image
https://zahidhasan.github.io/img/bias_variance6.png


https://forums.fast.ai/uploads/default/original/2X/5/57b24adcaf41ec93767e692094f00369a4f2e6fb.jpg

image
https://www.researchgate.net/profile/Cristiano_Ballabio/publication/222344717/figure/fig3/AS:325003603136513@1454498305120/Training-error-and-cross-validation-error-as-a-function-of-model-complexity.png

Too little training: underfitting
Too much training: overfitting
But that is the case if the network is “deep” enough to learn from the data. With “deep” I mean of a suitable architecture to be able to perform well.

If the network is too small, it won’t be able to predict accurately.
If the network is too big, you will need to apply regularization as it will be very capable of just “memorizing” the training set.

(Just my view)

lr_find
It spits out the validation error (prediction error against the validation set).

If there is no validation set, then there is no error to calculate.

Most probably you loaded the images without specifying a % to be used for validation. We do that (0% validation) on the stage when we want to ‘clean’ the dataset, so we work with all the images, not just training.

How many images are needed to train a robust cnn?

Jeremy mentions “less than you would think” in the video lecture. The snippet of js code that scrapes google images gives me 80 photos. Even with the most careful tagging using -keywords tags in my search, anywhere between 25 and 50 percent of my returned results are ‘noise’ that I have to manually prune out.

My dataset per classification ends up being 30-80 pictures per class, after manual cleaning. Is that good enough?

I’ve read elsewhere on the web that you need at least 1000 images per classification. That’s a lot.

And a second question:

What is the reason for running the cleaning process AFTER training the model instead of before?

Wouldn’t it be a better trained model if we started with good data?

In lesson 2, we set learner.precompute to false only for the last layers.
Instead will it always be best to first unfreeze and then precompute all weights with augmentation for better results?

As Jeremy always says, Try it and see…

I have had good results with 100 images per category.

One reason is so that we can use the trained model to help in the cleanup process.
e.g. Show me all the Cat1 images
Then it’s easier to identify images which are not in Cat1

Is an ImageDataBunch a DataLoaders object?

Trying to sort the difference between a DataBunch, DataBlock, and DataLoader. Thanks.

Also, what should I do if I see that my training and validation loss are decreasing very nicely, but error rate is staying the same?

i’m trying to clean the image dataset i created. I’m working with google colab so when i arrive this part:

# Don't run this in google colab or any other instances running jupyter lab.
# If you do run this on Jupyter Lab, you need to restart your runtime and
# runtime state including all local variables will be lost.
ImageCleaner(ds, idxs, path)

It clearly says “don’t run this in colab”
Then how im i suposed to clean my dataset if i can’t remove the images?

Thank you!

When try to predict class using loaded learner(classifier for black/grizzly/teddy bear, I can’t seem to get the prediction showing me the actual class, instead I’m getting this:

.
I’ve tried both on gradient & colab, can’t figure out why.

If you haven’t already done so, you will need to learn ‘exception handling’.

i have a problem with the javascript console approach on mobile devices.

basically i’m stuck abroad for the next 2 months and i’m working from an android tablet with a bluetooth keyboard which is fine except the mobile browsers (at least chrome and firefox) don’t have js consoles as they only understand remote debugging.

does anyone know of a work-around to this which doesn’t involve me going to find access to a pc?