Lesson 1: Classification Interpretation

wcneill · March 24, 2020, 11:15pm

In this step:

interp = ClassificationInterpretation.from_learner(foliar_learner)

losses, idxs = interp.top_losses()

len(data.valid_ds)==len(losses)==len(idxs)

What does it mean if the last line returns False? When I try to run

interp.plot_top_losses(9, figsize=(15,11))

I get an index out of bounds error. I know what this means generally speaking, but what does it mean in this specific case?

By the way, I’m using a different dataset that only has 4 classifications.

DArXToRm24 · March 25, 2020, 12:33am

It means that the lengths/values aren’t equal. I’m not 100% sure that command matters,but don’t take my word for it. Try anyway and see if the rest of the code works.

As for the ‘interp.plot_top_losses(9, figsize=(15,11)),’ it may be because, if you are using a different dataset, there may not be enough images or are not fitted to that size. What I suggest you do is play around with those numbers until something works, like making the numbers smaller.

wcneill · March 25, 2020, 12:34am

Hi @DArXToRm24, the rest of the code does indeed work. My error-rate is really bad, though

I’m going to post my finished Homework 1 results for feedback. Maybe you or someone else can provide insight on what went wrong and how I can improve.

DArXToRm24 · March 25, 2020, 12:38am

Ok, there’s nothing to worry about. I’d like to see it. And by the way, I just started a few weeks ago and am rerunning through lessons 1-4 again; I didn’t understand much. You’ll be great.

wcneill · March 25, 2020, 1:02am

@DArXToRm24 Just posted it in a different thread, but here it is

github.com

wcneill/kaggle/blob/master/foliar/foliar_fastai.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.vision import *\n",

This file has been truncated. show original

And, thank you for being kind enough to look at it.

DArXToRm24 · March 26, 2020, 1:52am

Ok, here’s a few things.

Your error rate isn’t 1% it’s 100%. Try to increase the learning rate as well as epoch. In the second epoch it goes to 84% which is 16% difference.
I can’t seem to find out what a foliar learner is, but if I were you, I’d try the CNN learner, because it looks like this learner isn’t the best for this situation.
For interp plot losses, I would change th 15,11 to smaller numbers, but first focus on making the basic model work before fine-tuning.

I know this doesn’t exactly math your questions you wrote in the code, but overall, I recommend playing with the model. And I’d definitely try the CNN instead of the foliar learner.

Let me know if you have any other questions.

muellerzr · March 26, 2020, 2:00am

To add onto @DArXToRm24 (which btw, foliar_learner was simply his variable name for the Learner generated from cnn_learner), when you first began fitting you didn’t find a learning rate to use, and this would be the first step to start at. Begin by doing an lr_find() and then pick your learning rate based on those results. I’d imagine you’ll get a better accuracy with this

DArXToRm24 · March 26, 2020, 2:10am

Yep. Thanks @muellerzr. You’re correct.

wcneill · March 26, 2020, 3:24am

@muellerzr and @DarXtoRm24, thank you both. I took your advice by running the learning rate finder first, and then picking the biggest down-slope as my max_lr range when I started to train my model:

foliar_learner.fit_one_cycle(10, max_lr=slice(9e-4, 2e-1))

0	0.524870	1.714344	0.972527	00:51
1	0.605830	1.261454	0.956731	00:50
2	0.587870	0.506096	0.742445	00:50
3	0.451023	0.252102	0.817308	00:50
4	0.337946	0.176136	0.806319	00:50

These results are better but still pretty bad. I started the model from scratch several times, and it seems like my model always settles to about 80% error. I even tried 10 epochs once.

I’m wondering if this dataset is too tough for what I know based on lessons 1 and 2? Or am I still doing something wrong?

Also, I noticed that there are at least a few mislabeled images. For example my top loss was this image:

Looking through the training images, there are many more mis-classifications in the training data.

DArXToRm24 · March 26, 2020, 7:52pm

Alright. At this point, I’m just going to shoot off suggestions that I don’t know will work, but they’re my best guesses. It’s good the we got those other things out of the way.

One thing you should do, for sure, is fix those mislabeled images. Go through the top losses and find those, because a model can’t run on ‘broken’ data. The loss is killing the model. Also, take out images that aren’t helpful.

My next suggestion is increase the learning rate, especially if fixing the data doesn’t work.

It’s good that you went through the data and found the issue. Good luck!