Lesson 2: further discussion ✅

Hello! @zachcaceres, I just found that now after run ImageCleaner(ds, idxs) it doesn’t delete the selected images, so I found this on the changes in github ():

image

So it doesn’t remove the images anymore, how can it be fixed?

1 Like

hey @jm0077!

yeah, we opted for a non-destructive action in the widget. Even though people want the file removed from their particular model/project, that doesn’t mean that they want to destroy the file itself. It was too easy to wreck your data.

The widget now uses its own CSV to manage the inclusion and labels of files. :slight_smile:

@lesscomfortable could you share the relevant example here?

3 Likes

please help to know how to delete the files with top losses or know the filename to delete them manually. :slight_smile:

1 Like

So basically you need to run the ImageCleaner as before but your changes are saved to a csv. To use your changed dataset you just need to build a new ImageDataBunch.from_csv and use your cleaned.csv. We included this line commented-out in the lesson2 notebook, you can find it at the beginning of the “View Data” section.

8 Likes

@lesscomfortable In the notebook, you run ImageCleaner on the validation dataset. If you use the csv to generate a new databunch, wouldn’t you just be looking at the cleaned validation dataset, rather than the training data and the cleaned validation?

I just want to make sure I understand :slight_smile:

2 Likes

True. I am working to solve that, see Create a Learner with no Validation Set. In the meantime, you can create an ImageDataBunch with as little validation set as possible (0 returns error) and do that again when loading from csv. If you figure out how to create an ImageDataBunch with all the dataset in train_ds and no valid_ds and you can create a Learner with it please let me know!

2 Likes

Quote from https://forums.fast.ai/t/lesson-2-further-discussion/28706/62:

Hello @zachcaceres. Any plan to adapt ImageCleaner() to all ImageBunch methods and not only from_folder() ?

I’m using the from_name_re() method and the following code gives back the training images, not the validation ones (even with ds_type=DatasetType.Valid):

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Valid)
ImageCleaner(ds, idxs)

hello @pierreguillou!

it’s unlikely that I would do adaptation of the widget for at least a month. Apologies, my plate is just too full until then.

I’m sure that Jeremy and Sylvain would welcome PRs that extend the widget and I also know that @lesscomfortable is familiar with the inner-workings and might be able to help.

I am getting following error from function DatasetFormatter().from_similars
Any idea how to get weight file ?

1 Like

tanismar [2:59 PM]
Hi there! This should be super-simple, but I can’t seem to find a way to do it. I want to implement a single image classifier (based on https://github.com/fastai/fastai_docs/blob/master/dev_nb/104c_single_image_pred.ipynb) to discriminate among the ImageNet classes. Ideally, I would just have to load the trained model (say, ResNet34), no transfer learning, no fine-tuning. However, ImageNet is not itself provided as a fast.ai Dataset (https://course.fast.ai/datasets), so I can’t figure out how to set my ‘data’ argument to pass to create_cnn() so that it keeps all the original classes. Has anyone tried this, or has some idea how to do it? Thanks in advance!

Hello guys, I am having troube running the new ImageCleaner widget. The error that it produces is:

TypeError: slice indices must be integers or None or have an index method

I have attached the code I am running plus the full error trace. Anyone has any idea?

Summary

ds, idxs = DatasetFormatter().from_toplosses(learn)

ImageCleaner(ds, idxs,path)


TypeError Traceback (most recent call last)
in ()
1 ds, idxs = DatasetFormatter().from_toplosses(learn)
----> 2 ImageCleaner(ds, idxs,path)

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in init(self, dataset, fns_idxs, batch_size, duplicates, start, end)
92 self._deleted_fns = []
93 self._skipped = 0
—> 94 self.render()
95
96 @classmethod

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in render(self)
220 self._skipped += 1
221 else:
–> 222 display(self.make_horizontal_box(self.get_widgets(self._duplicates)))
223 display(self.make_button_widget(‘Next Batch’, handler=self.next_batch, style=“primary”))

~/.anaconda3/lib/python3.7/site-packages/fastai/widgets/image_cleaner.py in get_widgets(self, duplicates)
180 “Create and format widget set.”
181 widgets = []
–> 182 for (img,fp,human_readable_label) in self._all_images[:self._batch_size]:
183 img_widget = self.make_img_widget(img, layout=Layout(height=‘250px’, width=‘300px’))
184 dropdown = self.make_dropdown_widget(description=’’, options=self._labels, value=human_readable_label,

TypeError: slice indices must be integers or None or have an index method

7 Likes

On the Resnet34 question (1:12 - https://youtu.be/ccMHJeQU4Qw?t=4332), @jeremy said you can set pretrained=False in the learner definition. Is this really true? I thoughts the model.resnet34 has weights to start from. I actually set the flag to false and got high error rate (20% vs 3%). Any update on this?

In Pytorch, Would a loss function like below work: ?

def my_loss_func(y_hat, y):
   cnt = 0  
   for idx, val in enumerate(y) :
         if val != -1 : 
           s = s + val  - y_hat[idx]
           cnt  = cnt + 1 
   return s/cnt

Basically I want to take into account losses for only those values where the real answer is not equal to -1 (a value I fill the missing values with).
If not, any ideas on the correct way to approach this?

The only error in your code is you need to initialize s = 0 as the first line in your function.

But I’d be wary of taking the sum of differences as the loss function, because positive and negative differences tend to cancel each other, and could give you a low value of the loss function even when the predictions don’t actually agree with the data.

For this reason, I would use the rms (root mean squared) error (or mean absolute error) instead. I’ve implemented rms error below:

# compute the rms error of the model for the selected targets
def my_loss_func(y_hat, y):
      # boolean indicator selects values for which target is not equal to -1
      idx = val != -1
      # compute and return the rms error between target and predicted target, for the selected values:
      error = y[idx] - y_hat[idx]     # error
      s = np.sqrt( np.mean( np.dot(error, error) ) )     # rms error
      return s
2 Likes

Yes, I had planned on using the rmse error, just wasn’t sure if the conditional would work. Thank you for providing the complete loss function.

Hey! I’m working on the lesson2-download notebook and, when running ImageCleaner(ds, idxs, path) I get an error message that says “Runtime disconnected” and my notebook freezes. Do you have any idea of what might be going on?

I’m running it on Colab. Could it be that it’s running out of memory or something like that?
Here you can see some info on my original dataset and the one that DatasetFormatter.from_toplosses() generates:

Thank you so much for your help!

3 Likes

At around 49:00 in the Lesson 2 video, @jeremy says that training loss being higher than validation loss means that you’re either training too slowly, or haven’t trained for enough epochs. However, directly above this in the notebook, looking at the output from the default learning rate setting (which I presume was pretty good, given the low error?), the training loss is indeed about 6x the validation loss.

So – this makes me wonder: Was the learning rate in the model output copied and pasted above too low? Or is there some train/validation loss ratio threshold that should alert us to a low learning rate?

Also relevant to the question raised by @atlascivan a few months ago in this thread – I had always understood that training loss should pretty much always be lower than validation loss, precisely because the training data are what are used to build the model, so the model should pretty much always do better on those than novel data. I’m not thinking about deep learning specifically, but ML in general.

I think what Jeremy meant (please try not to @ him or Rachel unless absolutely necessary) was that at the end of your training the training loss should be lower than the validation loss. With various regularisation techniques we are purposely making the training classification harder, so that the model generalises better. This explains why you are seeing higher training losses than validation ones (at least at the beginning).

In my (admittedly limited) experience I am happy when the training loss becomes less than the validation one only a few epochs before the networks starts overfitting (that is, the validation loss start increasing).

Hey, guys. In Lesson 2 (41:00) Jeremy shows how model can predict and uses a picture of a Black Bear from the dataset. I have some questions if you don’t mind:

  1. If that was a picture from the dataset, there’s an 80% chance it was in a training part of the dataset, so the model might have seen it already and just remembers it’s a Black Bear (overfit?). Is it important to make sure the model didn’t see a picture we’re trying to predict in order to check if can predict properly?
  2. If it’s important, how do I make sure the model didn’t see this picture? How can I see if the picture is in a training set or in a validation set?
  3. (Not related)How can I know the confidence score when predicting a class?

Validation set (even if extracted from the existing set) is not included in training. It is checked as if it were a test set. If I recall, an image was picked just to show how prediction works - this image could be something you upload but was picked from existing folder for convenience. There is one possibility in datasets created this way (download images from Google) of duplicates, so need to check for that.
You can see what goes into train, valid and test sets using data.train_ds … etc.
and when you call predict, you get a tensor of probabilities for each class.