Lesson 2: further discussion ✅

Guys, is there an SGD with restarts available in fastai v.1?

Yes it is,I guess, in learn.fit(); however I have no clue how to use it.

1 Like


It should be cyc_len

Hello! I am going through the Lesson 2 training and thinking about the time required to complete each epoch. I would love to see the amount of time required to complete each epoch included in the table showing learning rate, accuracy and loss by epoch. What could I add to this function to see this additional data?

How could we incorporate a “none of the above” option for classification? We regularly see mis-classified images where the image does not really belong to any of the classification options. But since “none of the above” is not really an option the image gets mis-classified. Any ideas on this? It seems like it should be a “one minus” type function…


Hello all,

If you’ve been using the FileDeleter widget to clean your data, some changes came out today that may interest you. source

FileDeleter was removed in favor of two new widgets: ImageDeleter and ImageRelabeler.

ImageDeleter is essentially the same as the old FileDeleter.

ImageRelabeler renders the files you pass in and lets you re-label them. This is to find mis-categorized images in your data directory. NOTE: ImageRelabeler currently only works with files where labels were created from the names of their parent directory (i.e. with .from_folder()). The widget moves mislabeled photos from the incorrect parent directory to the properly-labeled parent directory.

To relabel an image, just click the proper label in the widget dropdown.

Both widgets take a formatted dataset from DatasetFormatter. You can easily specify which dataset you’d like to render by passing in DatasetType.Valid, DatasetType.Train or DatasetType.Test to the ds_type keyword arg in DatasetFormatter.from_toplosses.

All three classes are available in fastai.widgets.

Example use:

ds, idxs = DatasetFormatter().from_toplosses(learn, ds_type=DatasetType.Train)
ImageRelabeler(ds, idxs)
# or
ImageDeleter(ds, idxs)

Real docs for these are coming soon!


I was having similar problem, I belive your training set have some strange images, after cleaning up it should be fine.

1 Like

I have asked the same question in this thread, see above. Still would like to have more insight into this and the intuition behind using an “extra” class or not:


Input images that do not belong to one of the trained classes is a problem for me, too.

Does anyone have a suggestion what to do in this case? Using vanilla Resnet will result in the network trying anything to output one of the trained classes. In most real world scenarios, there is no way to make sure that the network always gets fed using images containing one of the classes.
Using a “not_a_class” class would not allow to input a lot of negative examples, since at one point it will result in inbalanced classes.

Any suggestions? I did not find good information on this using google.

Best regards

1 Like

I would like to create mobile app that would classify house plants and tell me if the plant is poisonous to dogs/cats (I have both at home, so). Would it be better to classify species of plant and then compare with list of poisonous plant or to train the model on which plants are poisonous for plants and which for dogs?

Does anyone know cheap/free hosting that would be nice for REST api for this kind of app? I would like to finally make a mobile app which would talk to the api.

1 Like

This may help.

1 Like

I seem to be having a problem with the lr_find() function. Instead of shooting up at some point as shown in the class videos and notebooks, my lr_find() plot always shows loss decreasing as the learning rate increases (up until about 1e-02, though that changes with each run of lr_find(). I’ve tried using the default parameters and also tried using a few different custom parameters.

Here’s my learning rate plot with the default parameters:

And here’s my learning rate plot with a smaller start_lr and larger end_lr parameters learn.lr_find(start_lr=1e-10,end_lr=100,num_it=200)

What does this mean?

1 Like

Practical question
How long should we train a model for in stage 1? Until convergence?

I’ve tried to answer this question myself, but did’t get any clear answer. Jeremy mentions in lesson 2 ‘make some progress’.
Let’s say a model is fully trained (reaches convergence) in stage 1 after 20 epochs. In order to achieve the best possible result, should we:

  1. Fully train it for 20 epochs, and then try to fine tune? Or
  2. Just partially train for 5-10 epochs and the fine tune it until convergence?

In my experience I have not seen a major difference. What I’ve seen though is that when a model is fully trained in stage 1, there’s almost no option to improve it in stage 2 (fine tune).

Does anybody have any experience/ general guideline as to how to answer this question?

If your dataset is quite different to imagenet, you’ll most likely need to unfreeze and fine-tune.

1 Like

Thanks for the quick reply!
However I think my question was not clear enough.
Assuming I have a dataset quite different from imagenet, which is the preferred option in order to get best results:

  1. Fully training the model in stage 1 (let’s say for 20 epochs, until convergence), and then unfreeze and fine-tune, or
  2. Partially train the model in stage 1 (5-10 epochs), and then unfreeze and fine-tune

I ran through the Lesson 2 Notebook this week, and found that it took over 10 hours to run just one step (The 2nd learn.fit after first setting learning rate step as

lrs = np.array([lr/9,lr/3,lr])

Should I expect the DL process to always take so long? If it takes 15+ hours just to complete the second assignment, does that mean we are expected to spend 30+ hours to complete both the sample run-though and one experiment on our own? Or should the homework not take that long? Should we expect the runtime to continue to increase or is this an anomaly related to this example? Or am I doing it wrong?

Screenshot of my runtime for the step indicated is attached.

It is not possible to give a recipe. Train until it starts to overfit. If you are not satisfied with your results, tune up you HPs (dropout, weight decay, diverse LRs, etc…) and start again.

1 Like

I think you are using the cpu. Run a cell with torch.cuda.is_available(). If it answers True, run torch.cuda.set_device(0). Rerun your training and and report back.

1 Like

this needs to be updated in lesson2-download notebook
can i create a PR?

1 Like


I experimented with a binomial classification with a dataset that used say 36 images for training and 16 images for validation. The validation images contained 1 image that clearly did not belong to the either class. My resulting accuracy after stage 1 was close to 0.93. When I looked at interp.plot_top_losses plot, the bad image appeared every time. I did not pay attention to this fact and ran stage 2 with bigger image size. The accuracy did not get any better. So I started looking at my data again. I replaced the bad image(which was staring at me all the time) with another image I downloaded from images.google.com and guess what my accuracy improved to 100%

Lesson learned ----

  1. model accuracy is reflection of the validation data. (I could not have got 15/16=0.9375 accuracy given 1 out of my 16 images was terribly bad).

  2. validation data should be a reasonable representation of training data (good data and proportionate distribution across classes).

  3. and what Rachel said here

1 Like