Share your work here ✅

Thanks for thinking along. Yes I had a look at the house numbers dataset but my end-goal is to spot text, not only numbers.

See for the source-code (including download of fonts, ! wget -c


I had a go at applying the tabular data stuff from Lesson 4 to the Titanic competition on Kaggle

Performance is not stellar (top 25%) but learning how to do feature engineering on a Pandas dataframe was interesting. The trickiest part was working around a bug in fastai 1.0.24 (which has now been fixed).

1 Like

Running a little behind on the lectures and implementations :disappointed: but finally created my first classifier to classify John Oliver(the comedian) from Steve Mnuchin(US Treasury Secretary).

But I managed to get only about 72% accuracy though, even after removing useless images and figured out the reason after wasting a lot of time, turns out that a lot of the images in the data bunch are being transformed in such a way that they don’t contain either of them. Turned off max_warp and set max_zoom = 1. Still didn’t get much improvement so I will probably go ahead with the next lecture and come back to this once Jeremy addresses the get_transforms() function.

Nevertheless, a look at the transformed images and the issue:

Definitely found that actively doing rather than passively watching the lecture helps in learning better. Hopefully, I will do a lot more from now.


I might be super late to this party but I FINALLY got my very first web app up and running with my very first trained ML model, so Yay! :slight_smile:

I built a FER model, trained it over the KDEF dataset and then experimented with drastically different test images via my web app.

The KDEF dataset classifies emotions into 7 types, I use all. After the very first training cycle, I got an error rate of 6.4% and almost identical TL and VL.

After many many iterations trying different epochs and learning rates, I was able to get the model down to an error rate of 1.9% but my training loss ended up higher than validation loss by a tiny(?) bit:

PS: I have tons of questions on learning rates and epochs (some listed in notebook) and will really appreciate if someone can provide feedback on my approach and results!

At this stage, I put this model to production using Zeit and tested it for other images.

I observed that the KDEF dataset uses males and females in age group 20-30 and the images seemed predominantly caucasian. So I wanted to test my model against images of older women, people of color, kids/babies etc. Sharing some results below (more results and related questions in notebook):

Overall my model has not done well on random test images despite the 1.9% error rate during training - I’m not sure if its because of the way the KDEF dataset was created or if my model is overfitted or something else? My error rate had a downward trend all along, barring few fluctuations, so I don’t think it is overfitted… but, would love clarification on this!

Now I am really interested in understanding what the model “learnt” and why certain images from the test set got mis-classified… I’ve been guessing myself crazy! But I need to better understand what I did so far before I try anything else. :frowning:

The entire notebook is available here.
Give the app a go here!

Please please provide feedback/answers/questions… all of this is so new that it’ll be good to get validation on the approach and results. Thank you!


I just use CNN on music data for genre classification and used fastai library for transfer learning with an accuracy of 80%!
Have a look here.
Thank you!

1 Like

After lesson 4, I tried to combine tabular data with NLP, particularly in spanish.

I took a tabular dataset from an e-commerce marketplace with the objective of predicting products’ condition (new or used) based on listings’ features. It includes 100k records and after some data pre-processing (not included in the attached notebooks), I ended up with 30 features, including: 17 categorical, 12 continuous and 1 text field (listing’s title).

The process included:

  1. Creating a tabular model without the text feature (accuracy: 91.5%).
  2. Creating an NLP model to predict from the listing title:
    2.2. Training a language model in spanish from scratch: I used a Wiki corpus trimmed to around 130 million tokens (training for 6 epochs tooks 10 hours on a GTX 1080TI, reaching an accuracy of 30.5%).
    2.3. Appling ULMFiT: First training a domain language model (accuracy: 34.3%) and then classifier itself (accuracy: 81.5%). Then, the classifier was used to predict on the entire data set (probability of the product being new given the title).
  3. Creating a new tabular model, this time adding as a new feature the prediction coming from the NLP model (final accuracy: 92.4%).

I tried also extracting the last linear layer’s activations (50) from the NLP model and feeding them in the tabular model, but it didn’t improve accuracy. Something that I didn’t reach to try was removing the output layer of both models, concatenating the outputs and feed it in a linear model (unlike my simpler model, this would backprop to both models).

In this case, the effort of training the NLP model (particularly the spanish model from scratch) just improved something below 1%. However, it was nice learning exercise and now I have a spanish pre-trained model, that hopefully will be useful for others projects thanks to ULMFiT. :slight_smile:


That’s a relative error improvement of >10%, which is a lot! :slight_smile:


I’d be interested to see that, if you get it working…



I have an issue I want to know how to solve with deep learning. I have a list of “office hours” of service provider for homeless people in the city, usually in human readable format like:

“hoursOperation”: “Mon, Tue, Thu, Fri 8:30 am-11:30 am”,
“hoursOperation”: “Groups are held on the First and Third Wed 9:30 am-11 am”,

etc etc in other human friendly way …

and I want to translate those into something more machine friendly like:
“startTime”: “17:00”,
“endTime”: “18:00”,
“dayOfWeek”: [“Tue”, “Wed”],

what is the proper approach for this? I don’t think is classification. Would this be translation?


Hi all.

I finally wrote up my blog post about creating the guitar classification model. In the previous days I decided to redo the exercise and incorporate the new data block API, progressive resizing and other goodies of fastai v1.
Please let me know if my description of the one-cycle-routine, progressive resizing, etc. is off.

I think the results came our real nice and I’m still amazed how good progressive resizing works!

Notebook and other links are included…

Next on the list are write-ups:

Will be a little mini series in the end :wink:


Upgraded UCR Time Series Classification to image notebook

I’d like to share with you changes I’ve made to the OliveOil notebook I originally created based on some of the feedback received.

I’ve made the following updates gist:

  • Modified data source so that any of the 85 univariate UCD data sets can be used now
  • Added 3 new time series encoders
  • Modified the time series to image encoders so that images of different sizes can be created, independently of the time series length
  • Up to 3 encoder can be simultaneously used. Each encoder creates a single channel image, and a 3 channel image is created by combining them.
  • Incorporated the new data_block functionality

There are 7 image encoders available:

  • ’Default’: raw time series
  • ’Area’: time series area plot
  • ’2D’: time series in 2D
  • RecurrencePlots: Recurrence Plot
  • GASF: Gramian Angular Summation Field
  • GADF: Gramian Angular Difference Field
  • MTF: Markov Transition Field

This is how the same time series would look like after an encoder is applied:

I’ve run many tests with this updated notebook. If you are interested, you can read the key learnings in the Time series/sequential data study group thread.


Hello Everyone!
We (@kranthigv) had planned to write a blog since a longtime and finally we did it.
Check it out!


I tried to implement the embedding approach in the Collaborative Filtering from scratch in Keras. I got terrible results on the movielens 100k dataset :disappointed:
This is the accuracy of the model:
This is the losses of the model:

This is the Keras model:

num_factors = 5 # embedding dimentionality

# input
users_input = Input(shape=(1,))
items_input = Input(shape=(1,))

# embedding
user_weight = Embedding(num_users, num_factors, input_length=1)(users_input)
item_weight = Embedding(num_items, num_factors, input_length=1)(items_input)

# bias
user_bias = Embedding(num_users, 1, input_length=1)(users_input)
item_bias = Embedding(num_items, 1, input_length=1)(items_input)

# the collaborative filtering logic
res1 = Dot(axes=-1)([user_weight, item_weight]) # multiply users weights by items weights
res2 = Add()([res1, user_bias])                 # add user bias
res3 = Add()([res2, item_bias])                 # add item bias
res4 = Flatten()(res3)
res5 = Activation('sigmoid')(res4)              # apply sigmoid to get probabilities
# scale the probabilities to make them ratings
ratings_output = Lambda(lambda x: x * (max_score - min_score) + min_score)(res5)

model = Model(inputs=[users_input, items_input], outputs=[ratings_output])

I need to figure out what I missed to improve the model. All is detailed in this blog post. Any improvement suggestions?

Continuing our series of updates to our aircraft classifier project, I have added the Data Block API and progressively resized the dataset, from 32x32, to 64x64, to finally 128x128. We are now at 99.3% accuracy. Hooray.

Using the new model I have created this web app. Check it out at: deepair-v2.

I’ve written the following short Medium post describing the details of the process.

The accompanying notebook can be found at this gist.


Hi, everyone.

I have been playing around with audio classification, using bachir’s strategy of transforming the audio signal into an image represents its spectrogram, and then performing transfer learning on those images using the guidelines from the first three lessons. I tried this with the dataset from the tensorflow speech recognition challenge from Kaggle last year ( and I got an interesting result. The dataset comprises short utterances containing commands such as up, down, stop, go etc. In my first trial, I excluded the categories unknown and silence to facilitate training.

The best result was superior to the first place in the private leaderboard of Kaggle 10 months ago. However, I’d need to include the unknown and silence categories to perform a fair comparison.

I also applied this same approach to emotion recognition from speech using the IEMOCAP database ( This database contains speech signals uttered by actors and labeled in categories such as sadness, happiness, anger and so on. I started with two classes with a decent amount of data (one thousand samples each) and the first results are encouraging: I got about 93 % accuracy differentiating between anger and sadness. I’m curious to see the performance for the entire dataset.



I am working on an object verification problem that need to add new categories regularly. (similar to face verification, we first show the identity card and a model will verify if the face is matching)

I found 2 approach from here how-to-add-a-new-category-to-a-deep-learning-model . It says that we can retrain the model with old weight and adding this new category or using Content-based image retrieval. It means that we base on the last layer feature vector to decide the category (by calculating their hamming distance and euclidean distance). The paper of this technique you can find it here: Deep Learning of Binary Hash Codes for Fast Image Retrieval

Below is the image of concept of this technique:

I have tested the technique with Mnist data set, from number 0 to number 7 only. Number 8 and 9 will be add later. I use only the binarizing code of the last layer in this moment. I tested with number 9 and the results is quite ok. Every other numbers have low similarity (<60%), except number 7, it has 90% similarity with number 9. I will continue to test with Euclidean distance rather than this binary hamming distance.

I have written a blog about this here: Object Verification with Deep Learning of Binary Hash Codes
Source code is quite messy but if you are interested, you can find it here

I am very appreciated if someone can suggest me some techniques to deal with this problem. Thank you in advance


All - I took a stab at the Amazon Bin Challenge i.e. count number of items in Amazon Pods:

Here is the github:

I have a python script in the repo that will download a subset of the data for you in case you want to try this challenge.

Here are initial results:



Hi Matthew,

That is the “data wrangling” part - prepping raw data into a format more useful for analysis or ML models. How you slice and dice depends on what you are trying to do, and the design of your model. It looks like you may want to construct tabular data and perhaps look at relationships of other dimensions based on day of the week and/or time.

If that’s the case, you might be interested in the concept of “embeddings for categorical variables”. Here is a useful blog post and workshop video from Rachel on the subject:

wow, thanks! Will take a look and ask more questions!

I don’t think our normal convnets will work well for that. You’ll need to use object detection, which we should be covering in lesson 6.