Share your work here ✅

Congrats for your nice work. I have a question about using data sets on Colab. Does the folder that you define for Path (content/data/102flowers.mat) exist next to your notebook on Colab ?

Where are the 1,027 samples in your results / confusion matrix coming from? The test set is smaller than this, and I think about 20% of the training set is ~1,044.

Here’s my piece of work. I’m trying to classify government issued ID cards like driving licence, pan card(realated to income tax in India). I’ve used google_images_download to create my dataset

Currently I’ve two classes

  1. Driving Licence - 70 images that I was able to get from Google
  2. PAN Card - 25 images from Google

I used google_images_download to download the images to my local machine and properly label them. Once done, I’ve created a git repo on bitbucket and cloned the data into Paperspace storage using git.

Surprisingly, may be because of the less number of images, Google Colab was also able to successfully run the training.

What I wonder about the training is that, the error_rate kept on changing with the number of epochs

1 epoch
2 epochs
3 epochs
4 epcohs
5 epochs

Below are few of my doubts

  • Yesterday when I was running the same code, I got around 30-40% error rate. But today when I started the run, I initially got 11% and it finally settled at 5%. From the above screenshots, the error_rate started settling after i ran it for multiple times. Should the training ideally give the same error_rate for 1 epoch and 2 epochs. Why did it change to 11% for 2 epochs and fall back to 5% for 3 epochs.

  • One more question though. A pan card/ driving licence card look something like below, rectangular in shape.

PAN Card

Driving Licence

But when I was creating the DataBunch, these were transformed to 224 x 224 size resulting in a square piece. So the network ended up seeing only a part of the image instead of the full rectangular image. Should I be worried about this.

  • Also when the top losses are plotted, which I set to 9(default from the class notebook). There was only wrong classification. Why did the interpreter plot the correct ones as well ?

  • Visually, the two image classes I chose for the problem, are entirely different. I was expecting almost an error_rate of less than 1%, but here I am with an error rate of 5. Is it too simple a problem for the networks to work on ?

Looking forward to the feedback from fellow learners :slight_smile:

My end target for this project is to build a classifier that can classify official documents. I would also like to incorporate OCR to verify any forms that are submitted with details from these official documents.

Below is my notebook as a GitHub gist. I also made the BitBucket repo public. Feel free to use it.


I spent some time this past week building a dataset curator. It’s essentially two parts: scraper and curator.

The scraper grabs images from google image search based on your search phrase. It’s multithreaded and pulls down between 300-400 images per search in 30ish seconds on my laptop. It requires you install chromedriver somewhere on your file system and also the python package selenium.

The curator portion is an in notebook interactive session to locate duplicate/near duplicate images and garbage images in the downloaded data. It uses the intermediate layers of a pretrained vgg network and compares images based on the mean square error between their intermediate representations. If two images have similar representations then they’re probably very similar. It actually works really well! For garbage image detection, I look at those same intermediate representations and give images a score based on their total dissimilarity to all the other images in the set. The idea being that images that don’t belong will be most different from the actual “in class” images.

For both dup detection/garbage selection you’re presented with images in your notebook in order of their score (so most similar pairs will be shown first, etc) and asked to make a decision with a simple menu. The notebook cell clears its own outputs before presenting the next set of images so that your notebook doesn’t get too flooded.

For both processes since the images are shown in order, you will hopefully only have to go through a few pairs (for dup detection) or a few singles (for garbage removal) before you start seeing the kinds of images that you want. When this happens you can stop the process with the menu, and call purge to delete the marked bad files from your directory.

Here’s the code if you want to try it out (it’s not production quality so use at your own risk):

I used it to scrape images of Paladins (army howitzer) and Abrams (army tank) and built a classifier on that. I used to be an artillery officer and my family and friends would always mistake my Paladins for tanks. I was able to get around 94% accuracy by just training the new head of the conv learner on my curated dataset. So it looks like my family and friends should be ashamed of themselves :wink:



It’s my script that makes test set, it doesn’t divide in perfectly equal parts.

1 Like

I used same lesson-1 template with Resnet34 to classify hostel and hotel rooms. Dataset is created with by using Google Images Download and some images are removed manually to increase quality (final: 240x2 images). Accuracy is ~90.3%.

Some issues:

  • Chromedriver have to be download to scrape more than 100 photos with Google Images Download.
  • Some photos are bigger than limit. This causes to see “IOError: image file is truncated (nn bytes not processed)” in data normalize step.
from PIL import ImageFile

Adding this code beforehand helps to continue to process with truncated image.





Yes, if you want higher accuracy. Because most of the information lies in the top most area. Important features to distinguish between 2 categories are in top half of the card. That smart gold chip on license and silver income tax dept stamp on pan card and color as well. But that’s not the major problem right now.

The dataset is too small and with imbalanced classes. You need to add more images of pan.
That’s why it’s predicting correct class dl most of the times in the top_losses.

I can not recreate you results. And on Kaggle there are no so good results either. I believe, there could be some mistakes with loading data or downloading it. From confusion matrix, you can see that validation set has more than 1000 samples, but original data has 624. If you run data.valid_ds, data.train_ds in your notebook you can see something like this:
(DatasetTfm(ImageClassificationDataset of len 624), DatasetTfm(ImageClassificationDataset of len 5232))

1 Like

Hi Stefano,

thanks! It’s very clear and very useful.

I noticed that you didn’t fine-tune the resnet50 model by calling learn.unfreeze(). Could you please explain why? Did you try but it ‘broke’ the resnet50 weights?

Also, I was wondering: when you don’t call learn.unfreeze(), what does it mean when you specify max_lr=slice(1e-3,1e-2) in learn.fit_one_cycle()? Does it mean that a learning rate of 1e-3 is assigned to the penultimate FC layer (the one with 512 output activations) and 1e-2 is assigned to the very last FC layer (with 10 output activations)?

Hi, @radikubwa and @itsmuriuki made a mosquito species classifier. We haven’t tuned it so much but we are getting an accuracy of ~60. It was struggling because the organisms are from the same genus for instance gambiae species and funestus species. We are still working on improving it. Check it out here

google-image-download really helped us out for this one.


I’ve started working on Quick Draw competition dataset, here is a notebook with my attempt to train a simple classifier based on resnet34.

Here is a notebook with an analysis of a trained model. Or, I would say the first steps, I have out-of-memory errors (RAM) when trying to use model interpreter class. The training process is implemented as a standalone script in the same folder.

Nothing fancy I would say, the main interesting thing here is that I am generating images on the fly from strokes instead of saving them onto disk. The main reason is to save time because I am training the model on a small subset of the data (340,000 images, 1000 per category), and a huge bunch of images would occupy a lot of space on the disk.

Here is a fragment of the dataset class I’ve created:

class QuickDraw(Dataset):

    img_size = (256, 256)

    def __init__(self, root: Path, train: bool=True, take_subset: bool=True,
                 subset_size: FloatOrInt=1000, bg_color='white',
                 stroke_color='black', lw=2, use_cache: bool=True):

        subfolder = root/('train' if train else 'valid')
        cache_file = subfolder.parent / 'cache' / f'{}.feather'

        if use_cache and cache_file.exists():
  'Reading cached data from %s', cache_file)
            # walk around to deal with pd.read_feather nthreads error
            cats_df = feather.read_dataframe(cache_file)

  'Parsing CSV files from %s...', subfolder)
            subset_size = subset_size if take_subset else None
            n_jobs = 1 if DEBUG else None
            cats_df = read_parallel(subfolder.glob('*.csv'), subset_size, n_jobs)
            if train:
                cats_df = cats_df.sample(frac=1)
            cats_df.reset_index(drop=True, inplace=True)
  'Done! Parsed files saved into cache file')
            cache_file.parent.mkdir(parents=True, exist_ok=True)

        targets = cats_df.word.values
        classes = np.unique(targets)
        class2idx = {v: k for k, v in enumerate(classes)}
        labels = np.array([class2idx[c] for c in targets])

        self.root = root
        self.train = train
        self.bg_color = bg_color
        self.stroke_color = stroke_color
        self.lw = lw = cats_df.points.values
        self.classes = classes
        self.class2idx = class2idx
        self.labels = labels
        self._cached_images = {}

    def __len__(self):
        return len(

    def __getitem__(self, item):
        points, target =[item], self.labels[item]
        image = self.to_pil_image(points)
        return image, target

    def to_pil_image(self, points):
        canvas ='RGB', self.img_size, color=self.bg_color)
        draw = PILDraw.Draw(canvas)
        for segment in points.split('|'):
            chunks = [int(x) for x in segment.split(',')]
            while len(chunks) >= 4:
                line, chunks = chunks[:4], chunks[2:]
                draw.line(tuple(line), fill=self.stroke_color, width=self.lw)
        image = Image(to_tensor(canvas))
        return image

I’ve tried to make dataset class compliant with fastai but I am not sure if it works as expected. I guess that some learn methods could fail on my dataset class.

The work is still in progress. Next, I am going to take a bigger subset of data, and probably convert strokes into images files. Do you think that reading files from SSD could be performed faster than generating b/w 256x256 images dynamically?

Also, I am trying to play with iMaterialist Furniture dataset, and getting a lot of image downloading faults. I am using this script to download a single image. It uses concurrent workers, VPN proxy, and requests lib but still has a lot of faults. I guess it is not a big deal to skip some files from the training dataset but would like to have all samples from the test.


For aviation enthusiasts, I’ve created a project, that classifies aircrafts into the civilian, military(manned) and UAV (unmanned) categories, using 2500 images for each category. Was interested in checking on how accurate it could end up being.

I’ve written the following short Medium post describing some of the details. The accompanying notebook can be found at this gist.

1 Like

I got to 93% with my sports action classifier. Most of the errors around Rugby and Australian Rules Football and Soccer - not helped by may dataset having a few Gaelic football images too. The code to see the image names was super helpful for finding the obvious wrongly labelled images:[interp.top_losses(25)[1]]
and I’m sure some are still wrong - probably need to identify the team shorts to differentiate some of the games!

Just used the Lesson 1 flow - with resnet50, 6 cycles and 2 fine tuning cycles.

And here is the confusion matrix


I started a blog and wrote my first entry:

  • Discussed how I used fastai to create a classifier for 5 blue jay species.
  • Gave a friendly explanation of topics like: image augmentations, transfer learning, one-cycle policy.
  • Most importantly, I talked about how I almost got intimidated out of beginning to study deep learning.
  • And, how I realized first-hand that deep learning’s usefulness and impact is commensurate to how many “real folks” understand it’s a tool they can apply.

Would appreciate if you could take a moment to read and let me know if any feedback/suggestions/questions.

Thanks so much,

(Details if interested: my notebook is here; ResNet50 backbone with around 400 images used for training, curated manually; after 20 epochs of training, 0.05 validation error rate)


That’s really impressive. BTW you may want to add your twitter handle to your medium profile so when it’s shared you get mentioned automatically. Here’s my tweet:


Thanks Jeremy for reading, and really touched that you shared it on Twitter! Also good advice – I just went ahead and added my twitter handle to my medium profile.

1 Like

Distinguishing Lionel Messi from his almost identical Lookalike using fastai library with only 60 trainig images and getting 92.5% accuracy.

I scraped Leo Messy’s official instagram page and Reza Parastesh’s official instagram page.

From each of them I extracted 50 photos, adding 30 to train folder and 20 to valid folder.

The data is available at this URL.

Here is the Medium post I wrote about it.

Here is the gist:

1 Like

I’ve tried to fine tune but error_rate didn’t improve, and for my experience when completely unfrozen, the model tends to be “unstable” and hard to train if you’ve “small” training set (probably due to the large number of parameters to learn, even for resnet architecture).
I think that this is the reason why in lesson 1 the model has been unfrozen only after the training it with quite big LR.

I think that when the images you’re classifying are “similar” to the ones in the pre trained model (imagenet in this case), you need to unfreeze only the last layers because the first (at least 50%) are already good enough.

If you’re going to classify completely different images (ie: skin lesions or aerial photos), probably It’s better to unfreeze soon almost all layers.

About that I’ve updated the example to the new “create_cnn” syntax and train again from scratch using an approach similar to lesson 1, but unfreezing ONLY the last 50% of layers. Eventually the error rate increases a bit (22 errors instead of 18 of previous version), but submitting, kaggle gave me the same score…
Probably the differences are related to randomness in train/valid split…

I think so as far as I understand…

1 Like

I’m not sure that applying transformations would help here. Spectrograms are not photographs of objectc that can usually have different orientations…


@gianferrarif Thanks! Do you know how I can turn off transformations?