Share your work here ✅

Hi all,
I worked on using the Stanford Cars data-set, to predict make/model/year from an image of a car. I reached about 80% accuracy by using the resnet50 model along with the unfreezing and varying learning rate. Looking at the most confused, it seems that much of the confusion is associated with the same manufacturer and model or different model year, though there are some other errors that may be worth investigating more.

The notebook is here:

or in this repository:


Hi all…

Finally managed to do my guitar model prediction model… The built of a dataset took heaps longer than anticipated (and I only used 11 classes for now)…

Results look pretty impressive I have to say… The differences in these models is pretty subtle…

I wonder if it could tell replicas from originals or differentiate between 60s and 70s Strats etc. …


I had bad luck with birds and flowers, now I tried sounds and the results seems quite promising. I trained a classifier on spectrogram images generated from audio files that I downloaded from this Kaggle competition.
With a ResNet-34 and 4 epochs:

Total time: 36:42
epoch  train_loss  valid_loss  error_rate
1      2.823842    1.935167    0.541053    (27:39)
2      1.968809    1.414007    0.408421    (03:00)
3      1.570557    1.216676    0.344211    (03:01)
4      1.380666    1.171882    0.330526    (03:01)

The top losses are

The confusion matrix looks OK

After unfreezing and choosing a good slice of learning rates, I got even better results:

Total time: 25:07
epoch  train_loss  valid_loss  error_rate
1      1.272060    1.071349    0.293684    (03:08)
2      1.148295    0.994182    0.280526    (03:09)
3      1.040785    0.941049    0.264737    (03:08)
4      0.834645    0.837393    0.224737    (03:08)
5      0.664606    0.752477    0.205789    (03:08)
6      0.499639    0.716157    0.198421    (03:08)
7      0.399242    0.692799    0.188421    (03:07)
8      0.339768    0.671222    0.184737    (03:08)

Jupyter notebook - link


Hi All,
After a lot of pain and persisting through it, finally able to run a couple of experiments on the google audioset.

Here is the notebook to work on google audioset data. At a high level, audioset data contains human annotated labels (based on audio) in various youtube videos almost~2M. The annotation data has links to these videos, labels and the 10s clip in the video used. So we need to download the relevant youtube videos to prepare the dataset. This took a lot of time for me even with multiprocessing. Any suggestions on how to improve this are welcome.

Post download of data, below notebooks are used to convert audio clips to images of spectrogram images(thanks to @etown for the code) and run two experiments

  1. Dog Bark vs Cat Meow - Had close to 4.5k audio samples and ~5GB of data. Got an accuracy of 93%.
  2. Boat vs Motorcycle vs Racecar vs Helicopter vs Railroadcar - Even for just five classes, this dataset turned out to be too huge with ~33k audio clips. Post downloading, the 10s audio clips turned to around ~45GB of wav files. So it was a bit challenging to download the data given the huge network overhead
    Coming to the results, the accuracy is around 66% with both resnet 34 and 50.
    Also the model is grossly overfitting when training all the layers.

Will look to improve upon this and avoid overfitting based on the next lessons


I have started my Batik classification. Batik is Indonesia’s ancient dyeing technique for cloth. For the first attempt I use just a small batik dataset contains only 300 pictures split in to 50 types of Batik cloth. Each cloth is captured to as much as six random images and then resized to 128x128 pixels size in JPEG format.

It seems that this small dataset with 50 classes is not really a challenge, since both Resnet models achieved accuracy of 100% after just few epochs.

An here is the notebook:
The next step would be to find more comprehensive Batik dataset, which is maybe the biggest challenge it self :slight_smile:


Following the examples of previous works with building web API, I am trying to create something similar using Quick Draw dataset.

I’ve trained the model on a small subset of data during a couple of epochs, so the quality of predictions is rather pathetic: it works well in recognizing zig-zags only :smile: However, I guess it is possible to do much better if train for a longer period and use more data, and deeper architecture.

Here is a link to the repository. In general, it just creates a Starlett app and serves a simple page with model waiting for an image. I guess I’ll deploy it using Now or something when having​ a better quality of the model.


Hi All

I tried the ConvLearner against the Stanford Car Dataset that consists of 196 classes. I used only the train folder and used the fastai library ImageDataBunch.from_csv for the labels. That was a good learning as I failed a few times before I got it right. I tried it on the Resnet34 model and got an error rate of 44% after running fit_one_cycle(4) two times.

Then I tried Resnet50 and ran fit_one_cycle(5) two times and then fit_one_cycle(15) once. I finally got an error rate of 18%. Not sure if this is good but wanted to try this on a dataset where the number of classes are plenty. The link to my notebook is here.

1 Like

Hi all,
I have trained a Resnet50 model on Fisheries Monitoring data.
This was a detection as well as classification competition, so from the very beginning, I knew that model will overfit and the obvious thing happened.
Experimented with the model, finetunned it, but ended up with train_loss: 0.143552 , valid_loss : 0.547791 and an error_rate of 9.76%

One question, I was getting different learning rate graphs in In [29]: and In [45]: (See my fisheries_monitoring notebook) , though I was running learn.recorder.plot() immediately after loading my resnet50_stage_1 model both the times. Why this unusual happening?

Thats pretty impressive results.
Audio classification shouldn’t be straight forward using the lesson1 model as the data is so different from the images ResNet saw in ImageNet - please share your notebook if you wish to, I m curious.

good work regardless :+1:

1 Like

Hi guys,
While training image classifiers over the week, I found it a bit difficult to get my model’s prediction on a single image or a bunch of images. So, I created a small library which takes a FastAI Learner and creates a web-based UI where you can upload one more images and check your model’s predictions. Here’s how it works:

Install the library via pip:
pip install servefastai --upgrade

It just takes one line of code to serve a FastAI Learner:

from servefastai import serve
serve(learn) # learn is a FastAI Learner object

Then navigate to http://PUBLIC_IP:9999 in a new tab, where PUBLIC_IP is the external public IP of the machine where you are running

You’ll see a UI like this:

Once you select some files from your computer and press ‘Submit’, you’ll see a new page with the predictions:

And that’s it! Hope it helps. The code is open source:

Here’s a video demo if you need it:


Here’s my work for the week:

I could be able to build a classifier to identify 10 different car models with an accuracy of 98%. This is using 100+ images for each car model.

Initial I got some accuracy between 80-90%.

Then I did a simple modification to my images. Have a look at below:

Usually an image of vehicle is a rectangle. Fastai does center cropping and that’ll hide some details from the classifier. So, I manually created crops for each and every original images as above.

After I used those images, the accuracy went really high.

Here’s the complete story behind this classifier( including how I download images, publish my datasets and key ideas behind this)

And here’s the notebook.

Anyway, I’m totally noob to this kind of work.
May be I might be doing something wrong.
If so, help me to figure it out.


This is very cool. Nice work. Can’t wait to try it out.
Is there an API only mode? So, I can hook up this with a different frontend?


Currently it’s only the UI, but I’m working on API endpoints too.


Thanks @navjots my notebook is still messy I’m working with Colab and Google Drive, it isn’t that great the kernel kept dying. I still have to run the model on the test data. After I will clean it and push it on github.

1 Like

I have created my image classifer for indian man/ woman. I used a training set with 60 images each (man/woman). My validation set has 10 images each( man/woman )Training set has urban men/women pics. I tried having black and white rural faces in the validation set. I trained with resnet34. I also tested my model wth custom images. Here is my notebook.

How can i improve my accuracy ? These are the things that i think i can do

  1. One way is to add more black and white images of rural men and women in the training set.
  2. Add more images to the training set

What other ways can i try to improve the model?


Hi everyone!

I’m currently working on a project to segment & classify the condition of buildings seen on hi-res aerial (drone) imagery taken over Zanzibar island, Tanzania. As a step in the workflow, I trained a classifier based on lesson1 notebook to distinguish between 4 types/conditions of buildings on a variety of images (different sizes, ratios, blurriness):

“Complete”, “Incomplete”, “Foundation”, and “Empty” (no building in image)

Using resnet50 pretrained backbone, this achieved 93% accuracy on the 4 classes. Performance is probably even better than the stated number because looking at predictions with highest losses, they’re either mislabeled or so small/ambiguous in appearance that I’m not able to tell what class they should be in either:

I also used the excellent t-SNE notebook/code from @KarlH (thanks! his original post is in this thread here) to visualize how the model is grouping representations. Very helpful diagnostics to understand what is very clearly separated (“Empty” images) and what characteristics make classification more erroneous (visual features like partially roofless rooms of buildings that confuse between “Incomplete” and “Complete”).

Look forward to exploring more how to use these techniques to diagnose model errors and improve training with less data (i.e. selectively train in later cycles on harder data that’s more similar to what the model is struggling on):

Here is my notebook:

In it, I show training on resnet34 and resnet50, loading & predicting on a new external set of test images, packaging up the test predictions in pandas to csv file, and t-SNE visualization.

I load my train and validation data (data.train_ds & data.valid_ds) differently than what’s shown in the lesson by peeling the onion a few layers and using ImageClassificationDataset() instead of ImageDataBunch.from_name_re(). I did this to directly define which image and corresponding label files go into validation vs training. Because I’m working with geospatial image tiles that come from larger grids that are adjacent or sometimes overlapping, there’s the risk of data leakage if I’m not careful about keeping data from different grids cleanly and consistently separated. Defining exactly what files go into train/val also lets me do some hacky stuff to balance my classes: training on a half of the majority class for a cycle and then redefining the dataset with the other half of that class for another cycle of training. I’m sure there is a more elegant way to do this…still looking into it.

I mentioned upfront that this is a segmentation + classification task. The segmentation part I started working on first using the older v0.7 of fastai library so there’s some major duct-taping of workflows and data processing going on. I’m looking forward to updating the segmentation work to Fastai v1 and sharing it with everyone!

Here’s a preview of what the end product (segment + polygonize + classify) currently looks like:

(green = “Completed”, yellow = “Incomplete”, red = “Foundation”)



I downloaded a fun dataset of Traditional Decor Patterns from Kaggle.

As you can see I first tried fitting the model without transforms. Training resnet34 I got an error rate of 12% for the data without transforms. After applying transforms I only had a 4% error rate in distinguishing between 7 different traditional decor patterns.

Considering the varied shapes of the objects on which the patterns are printed I think that’s pretty awesome! And the data set isn’t huge, just under 500 images.


Hello all!
I was wondering how fastai library (using master branch, 1.0.16.dev0) handles a huge amount of data and I took dataset from kaggle competition from Google – Inclusive Images Challenge. There are 1.7+ mil images (0.5+TB) with 18k+ unique classes. Test set for this competition is 3.5GB or 32k images. The task is multi-label classification, meaning that each image can contain many classes simultaneously. It’s forbidden to use pre-trained models in this competition, but my first goal is to try to use “vanilla” fastai setup for this task and look how it goes. And maybe later I’ll try to retrain model from scratch (not sure how to set it up yet :slight_smile:).

My findings so far:

  • I was able to run this task with almost no custom setup – only data preparation. And fbeta loss function doesn’t seem work right on my data, so I was needed to add torch.squeeze to y_preds in fbeta before all calculations (didn’t dig deeper for the cause of that).
  • fastai library works awesome with multi-label data from the box, but there are no actual docs how to run tasks like that. I found Planet example, which looks a bit outdated and without inference step. Also, I didn’t see examples of class threshold finding
  • Performance of model after fit_one_cycle on this task with 18k labels seems very bad. I think it can be because of such long epoch (almost 30k batches per epoch, 5 hours per epoch). Loss at the end of single epoch looks too small, 0.001-.
  • Performance could be bad also because of highly unbalanced labels – right now I’m experimenting with just 100 top classes (it reduced 1.7mil to 1.4mil images). Still bad though
  • Time of training wasn’t so different for resnet34/50/152 with frozen layers – about 5h/epoch – I was surprised, but maybe bottleneck is not in architecture, but in images preprocessing (I have 1950x CPU with 32 threads and run experiments with 30 workers for data), not sure.
  • With such big dataset, it’s critical to check the integrity of all data – my first experiment failed after 3h of training because of one missed image.
  • You should try all pipeline on smaller set of data – I just copied 1000 images in train_small directory and then set up all things to save checkpoints, make predictions etc.
  • Read manual! Always check required params for methods you use. My second experiment failed because I just called without checkpoint name param. And I have to retrain it again for 5h.

I like how optimal library uses all resources it needs:


I ended up using the very useful notebook to grab my data from Google Images. I grabbed pictures of 11 different galaxies and then did some slight hand pruning for getting rid of some of the junk images.

With resnet34, I was able to get about 68% accuracy and then with resnet50, that increased to about 75% accuracy.

Here is the notebook:

Example Images:

Ending Confusion Matrix:


I’m glad you found the notebook helpful.

I’m curious how you’re polygonizing your segmentations. I’m working on a segmentation problem and I’m looking for a good method to turn blobby rectangles into pretty rectangles.

1 Like