Share your work here ✅

I’m doing a multi-label prediction project with Pizza Slices. My ultimate goal is to make a GAN-powered app that lets you design the “perfect pizza” slice by clicking buttons to add toppings. This is a first go with using the multi-label classification task like the planet notebook.

My dataset is pretty tiny (only 167 images total).

The potential labels are:

  • mozzarella_cheese
  • tomato_sauce
  • pepperoni
  • peppers
  • sausage

82% accuracy after some initial fine-tuning.

I’m not sure how to do the most_confused or confusion_matrix with this task. I would love some advice!! Thank you.


I tried to classify programming languages based on the text. I want to test out a theory that you don’t need to read through specific text in a document to be able to classify what kind of a document it is.
My dataset was not very large. My error rate was pretty high (30% for 3 classes)
My next course of action will be to increase the size of the dataset and try again. However, please give me suggestions on how you would approach the issue differently (still without using OCR or NLP).


I’m still experimenting on image recognition and did some fun experiment to recognise the dosing numbers (‘500’, ‘250’ etc) on medical packages info like these:

Instead of using a lot of package images, I tried to build a word generator. Inspired by the paper ‘Reading Text in the Wild with Convolutional Neural Networks’ (

Using 3000 free fonts of google it generates 1000 images per class like these:

Next to that I generated background/other images by cropping small parts from other package images:
background_0_1479726 background_0_1270260

Resulting in a accurate enough classifier with a resnet34 and fit_one_cycle(2):

Finally I assess a test image by taking crops and checking each crop.

For the first image resulting in:
[(‘background’, tensor(0.9994)),
(‘background’, tensor(0.9972)),
(‘background’, tensor(0.9956)),
(‘background’, tensor(0.9983)),
(‘background’, tensor(0.9903)),
(‘background’, tensor(0.9669)),
(‘background’, tensor(0.9911)),
(‘background’, tensor(0.9783)),
(‘background’, tensor(0.9533)),
(‘background’, tensor(0.9616)),
(‘background’, tensor(0.9490)),
(‘background’, tensor(0.9971)),
(‘background’, tensor(0.9605)),
(‘background’, tensor(0.9728)),
(‘background’, tensor(0.9144)),

(‘background’, tensor(0.5990)),
(‘background’, tensor(0.8826)),
(‘500’, tensor(0.8235)),
(‘background’, tensor(0.8679)),
(‘background’, tensor(0.9674)),

(‘background’, tensor(0.9831)),
(‘background’, tensor(0.9468)),
(‘background’, tensor(0.9875)),
(‘background’, tensor(0.9906)),
(‘background’, tensor(0.9839)),
(‘background’, tensor(0.9891))]

Obviously this can be improved a lot by integrating a bounding box classifier instead, but it was quite fun to quickly test an idea. Thanks Jeremy and team!

And I would love to get input (like other papers, methods, experiments etc) !


I enjoy learning through open competition. This way someone else has done the very hard work of data collection, and SOTA results are public and detailed. Using fastai v1, and learning it along the way, I teamed up with @kcturgutlu and @radek to achieve 14th out of 894 teams in Kaggle’s Airbus Ship Detection Challenge, narrowly missing a gold medal.

This was basically fastai v1 ‘out of the box’. A world class model was trained in half a dozen lines of code. You can read our solution here

Thank you to fastai and jeremy and the community. I only came to pytorch/python early this year through fastai. My mission is to exploit the ‘practical’ side of fastai as a way to experiment and achieve great results quickly and easily.


Have you considered the SVHN dataset?

Would love to see how these to datasets interact. How did you get all these fonts? Could you share your code? Then I would not need to recreate the thing from scratch.

1 Like

Thanks for thinking along. Yes I had a look at the house numbers dataset but my end-goal is to spot text, not only numbers.

See for the source-code (including download of fonts, ! wget -c


I had a go at applying the tabular data stuff from Lesson 4 to the Titanic competition on Kaggle

Performance is not stellar (top 25%) but learning how to do feature engineering on a Pandas dataframe was interesting. The trickiest part was working around a bug in fastai 1.0.24 (which has now been fixed).

1 Like

Running a little behind on the lectures and implementations :disappointed: but finally created my first classifier to classify John Oliver(the comedian) from Steve Mnuchin(US Treasury Secretary).

But I managed to get only about 72% accuracy though, even after removing useless images and figured out the reason after wasting a lot of time, turns out that a lot of the images in the data bunch are being transformed in such a way that they don’t contain either of them. Turned off max_warp and set max_zoom = 1. Still didn’t get much improvement so I will probably go ahead with the next lecture and come back to this once Jeremy addresses the get_transforms() function.

Nevertheless, a look at the transformed images and the issue:

Definitely found that actively doing rather than passively watching the lecture helps in learning better. Hopefully, I will do a lot more from now.


I might be super late to this party but I FINALLY got my very first web app up and running with my very first trained ML model, so Yay! :slight_smile:

I built a FER model, trained it over the KDEF dataset and then experimented with drastically different test images via my web app.

The KDEF dataset classifies emotions into 7 types, I use all. After the very first training cycle, I got an error rate of 6.4% and almost identical TL and VL.

After many many iterations trying different epochs and learning rates, I was able to get the model down to an error rate of 1.9% but my training loss ended up higher than validation loss by a tiny(?) bit:

PS: I have tons of questions on learning rates and epochs (some listed in notebook) and will really appreciate if someone can provide feedback on my approach and results!

At this stage, I put this model to production using Zeit and tested it for other images.

I observed that the KDEF dataset uses males and females in age group 20-30 and the images seemed predominantly caucasian. So I wanted to test my model against images of older women, people of color, kids/babies etc. Sharing some results below (more results and related questions in notebook):

Overall my model has not done well on random test images despite the 1.9% error rate during training - I’m not sure if its because of the way the KDEF dataset was created or if my model is overfitted or something else? My error rate had a downward trend all along, barring few fluctuations, so I don’t think it is overfitted… but, would love clarification on this!

Now I am really interested in understanding what the model “learnt” and why certain images from the test set got mis-classified… I’ve been guessing myself crazy! But I need to better understand what I did so far before I try anything else. :frowning:

The entire notebook is available here.
Give the app a go here!

Please please provide feedback/answers/questions… all of this is so new that it’ll be good to get validation on the approach and results. Thank you!


I just use CNN on music data for genre classification and used fastai library for transfer learning with an accuracy of 80%!
Have a look here.
Thank you!

1 Like

After lesson 4, I tried to combine tabular data with NLP, particularly in spanish.

I took a tabular dataset from an e-commerce marketplace with the objective of predicting products’ condition (new or used) based on listings’ features. It includes 100k records and after some data pre-processing (not included in the attached notebooks), I ended up with 30 features, including: 17 categorical, 12 continuous and 1 text field (listing’s title).

The process included:

  1. Creating a tabular model without the text feature (accuracy: 91.5%).
  2. Creating an NLP model to predict from the listing title:
    2.2. Training a language model in spanish from scratch: I used a Wiki corpus trimmed to around 130 million tokens (training for 6 epochs tooks 10 hours on a GTX 1080TI, reaching an accuracy of 30.5%).
    2.3. Appling ULMFiT: First training a domain language model (accuracy: 34.3%) and then classifier itself (accuracy: 81.5%). Then, the classifier was used to predict on the entire data set (probability of the product being new given the title).
  3. Creating a new tabular model, this time adding as a new feature the prediction coming from the NLP model (final accuracy: 92.4%).

I tried also extracting the last linear layer’s activations (50) from the NLP model and feeding them in the tabular model, but it didn’t improve accuracy. Something that I didn’t reach to try was removing the output layer of both models, concatenating the outputs and feed it in a linear model (unlike my simpler model, this would backprop to both models).

In this case, the effort of training the NLP model (particularly the spanish model from scratch) just improved something below 1%. However, it was nice learning exercise and now I have a spanish pre-trained model, that hopefully will be useful for others projects thanks to ULMFiT. :slight_smile:


That’s a relative error improvement of >10%, which is a lot! :slight_smile:


I’d be interested to see that, if you get it working…



I have an issue I want to know how to solve with deep learning. I have a list of “office hours” of service provider for homeless people in the city, usually in human readable format like:

“hoursOperation”: “Mon, Tue, Thu, Fri 8:30 am-11:30 am”,
“hoursOperation”: “Groups are held on the First and Third Wed 9:30 am-11 am”,

etc etc in other human friendly way …

and I want to translate those into something more machine friendly like:
“startTime”: “17:00”,
“endTime”: “18:00”,
“dayOfWeek”: [“Tue”, “Wed”],

what is the proper approach for this? I don’t think is classification. Would this be translation?


Hi all.

I finally wrote up my blog post about creating the guitar classification model. In the previous days I decided to redo the exercise and incorporate the new data block API, progressive resizing and other goodies of fastai v1.
Please let me know if my description of the one-cycle-routine, progressive resizing, etc. is off.

I think the results came our real nice and I’m still amazed how good progressive resizing works!

Notebook and other links are included…

Next on the list are write-ups:

Will be a little mini series in the end :wink:


Upgraded UCR Time Series Classification to image notebook

I’d like to share with you changes I’ve made to the OliveOil notebook I originally created based on some of the feedback received.

I’ve made the following updates gist:

  • Modified data source so that any of the 85 univariate UCD data sets can be used now
  • Added 3 new time series encoders
  • Modified the time series to image encoders so that images of different sizes can be created, independently of the time series length
  • Up to 3 encoder can be simultaneously used. Each encoder creates a single channel image, and a 3 channel image is created by combining them.
  • Incorporated the new data_block functionality

There are 7 image encoders available:

  • ’Default’: raw time series
  • ’Area’: time series area plot
  • ’2D’: time series in 2D
  • RecurrencePlots: Recurrence Plot
  • GASF: Gramian Angular Summation Field
  • GADF: Gramian Angular Difference Field
  • MTF: Markov Transition Field

This is how the same time series would look like after an encoder is applied:

I’ve run many tests with this updated notebook. If you are interested, you can read the key learnings in the Time series/sequential data study group thread.


Hello Everyone!
We (@kranthigv) had planned to write a blog since a longtime and finally we did it.
Check it out!


I tried to implement the embedding approach in the Collaborative Filtering from scratch in Keras. I got terrible results on the movielens 100k dataset :disappointed:
This is the accuracy of the model:
This is the losses of the model:

This is the Keras model:

num_factors = 5 # embedding dimentionality

# input
users_input = Input(shape=(1,))
items_input = Input(shape=(1,))

# embedding
user_weight = Embedding(num_users, num_factors, input_length=1)(users_input)
item_weight = Embedding(num_items, num_factors, input_length=1)(items_input)

# bias
user_bias = Embedding(num_users, 1, input_length=1)(users_input)
item_bias = Embedding(num_items, 1, input_length=1)(items_input)

# the collaborative filtering logic
res1 = Dot(axes=-1)([user_weight, item_weight]) # multiply users weights by items weights
res2 = Add()([res1, user_bias])                 # add user bias
res3 = Add()([res2, item_bias])                 # add item bias
res4 = Flatten()(res3)
res5 = Activation('sigmoid')(res4)              # apply sigmoid to get probabilities
# scale the probabilities to make them ratings
ratings_output = Lambda(lambda x: x * (max_score - min_score) + min_score)(res5)

model = Model(inputs=[users_input, items_input], outputs=[ratings_output])

I need to figure out what I missed to improve the model. All is detailed in this blog post. Any improvement suggestions?

Continuing our series of updates to our aircraft classifier project, I have added the Data Block API and progressively resized the dataset, from 32x32, to 64x64, to finally 128x128. We are now at 99.3% accuracy. Hooray.

Using the new model I have created this web app. Check it out at: deepair-v2.

I’ve written the following short Medium post describing the details of the process.

The accompanying notebook can be found at this gist.


Hi, everyone.

I have been playing around with audio classification, using bachir’s strategy of transforming the audio signal into an image represents its spectrogram, and then performing transfer learning on those images using the guidelines from the first three lessons. I tried this with the dataset from the tensorflow speech recognition challenge from Kaggle last year ( and I got an interesting result. The dataset comprises short utterances containing commands such as up, down, stop, go etc. In my first trial, I excluded the categories unknown and silence to facilitate training.

The best result was superior to the first place in the private leaderboard of Kaggle 10 months ago. However, I’d need to include the unknown and silence categories to perform a fair comparison.

I also applied this same approach to emotion recognition from speech using the IEMOCAP database ( This database contains speech signals uttered by actors and labeled in categories such as sadness, happiness, anger and so on. I started with two classes with a decent amount of data (one thousand samples each) and the first results are encouraging: I got about 93 % accuracy differentiating between anger and sadness. I’m curious to see the performance for the entire dataset.