Share your work here ✅

ramon · November 15, 2018, 11:30am

I’m still experimenting on image recognition and did some fun experiment to recognise the dosing numbers (‘500’, ‘250’ etc) on medical packages info like these:

Instead of using a lot of package images, I tried to build a word generator. Inspired by the paper ‘Reading Text in the Wild with Convolutional Neural Networks’ (https://arxiv.org/pdf/1412.1842.pdf?))

Using 3000 free fonts of google it generates 1000 images per class like these:

Next to that I generated background/other images by cropping small parts from other package images:
background_0_1479726 background_0_1270260

Resulting in a accurate enough classifier with a resnet34 and fit_one_cycle(2):

Finally I assess a test image by taking crops and checking each crop.

For the first image resulting in:
[(‘background’, tensor(0.9994)),
(‘background’, tensor(0.9972)),
(‘background’, tensor(0.9956)),
(‘background’, tensor(0.9983)),
(‘background’, tensor(0.9903)),
(‘background’, tensor(0.9669)),
(‘background’, tensor(0.9911)),
(‘background’, tensor(0.9783)),
(‘background’, tensor(0.9533)),
(‘background’, tensor(0.9616)),
(‘background’, tensor(0.9490)),
(‘background’, tensor(0.9971)),
(‘background’, tensor(0.9605)),
(‘background’, tensor(0.9728)),
(‘background’, tensor(0.9144)),

…

(‘background’, tensor(0.5990)),
(‘background’, tensor(0.8826)),
(‘500’, tensor(0.8235)),
(‘background’, tensor(0.8679)),
(‘background’, tensor(0.9674)),

…

(‘background’, tensor(0.9831)),
(‘background’, tensor(0.9468)),
(‘background’, tensor(0.9875)),
(‘background’, tensor(0.9906)),
(‘background’, tensor(0.9839)),
(‘background’, tensor(0.9891))]

Obviously this can be improved a lot by integrating a bounding box classifier instead, but it was quite fun to quickly test an idea. Thanks Jeremy and team!

And I would love to get input (like other papers, methods, experiments etc) !

digitalspecialists · November 15, 2018, 1:18pm

I enjoy learning through open competition. This way someone else has done the very hard work of data collection, and SOTA results are public and detailed. Using fastai v1, and learning it along the way, I teamed up with @kcturgutlu and @radek to achieve 14th out of 894 teams in Kaggle’s Airbus Ship Detection Challenge, narrowly missing a gold medal.

This was basically fastai v1 ‘out of the box’. A world class model was trained in half a dozen lines of code. You can read our solution here https://www.kaggle.com/c/airbus-ship-detection/discussion/71664

Thank you to fastai and jeremy and the community. I only came to pytorch/python early this year through fastai. My mission is to exploit the ‘practical’ side of fastai as a way to experiment and achieve great results quickly and easily.

takotab · November 15, 2018, 1:23pm

Have you considered the SVHN dataset? http://ufldl.stanford.edu/housenumbers/

Would love to see how these to datasets interact. How did you get all these fonts? Could you share your code? Then I would not need to recreate the thing from scratch.

ramon · November 15, 2018, 2:54pm

Thanks for thinking along. Yes I had a look at the house numbers dataset but my end-goal is to spot text, not only numbers.

See https://github.com/ramonhollands/spotting-text for the source-code (including download of fonts, ! wget -c https://github.com/google/fonts/archive/master.zip)

tamlyn · November 15, 2018, 9:47pm

I had a go at applying the tabular data stuff from Lesson 4 to the Titanic competition on Kaggle

https://www.kaggle.com/tamlyn/titanic-fastai

Performance is not stellar (top 25%) but learning how to do feature engineering on a Pandas dataframe was interesting. The trickiest part was working around a bug in fastai 1.0.24 (which has now been fixed).

akshayb7 · November 15, 2018, 10:38pm

Running a little behind on the lectures and implementations but finally created my first classifier to classify John Oliver(the comedian) from Steve Mnuchin(US Treasury Secretary).

But I managed to get only about 72% accuracy though, even after removing useless images and figured out the reason after wasting a lot of time, turns out that a lot of the images in the data bunch are being transformed in such a way that they don’t contain either of them. Turned off max_warp and set max_zoom = 1. Still didn’t get much improvement so I will probably go ahead with the next lecture and come back to this once Jeremy addresses the get_transforms() function.

Nevertheless, a look at the transformed images and the issue:
SMJO

Definitely found that actively doing rather than passively watching the lecture helps in learning better. Hopefully, I will do a lot more from now.

nbharatula · November 16, 2018, 8:45am

I might be super late to this party but I FINALLY got my very first web app up and running with my very first trained ML model, so Yay!

I built a FER model, trained it over the KDEF dataset and then experimented with drastically different test images via my web app.

The KDEF dataset classifies emotions into 7 types, I use all. After the very first training cycle, I got an error rate of 6.4% and almost identical TL and VL.

After many many iterations trying different epochs and learning rates, I was able to get the model down to an error rate of 1.9% but my training loss ended up higher than validation loss by a tiny(?) bit:

PS: I have tons of questions on learning rates and epochs (some listed in notebook) and will really appreciate if someone can provide feedback on my approach and results!

At this stage, I put this model to production using Zeit and tested it for other images.

I observed that the KDEF dataset uses males and females in age group 20-30 and the images seemed predominantly caucasian. So I wanted to test my model against images of older women, people of color, kids/babies etc. Sharing some results below (more results and related questions in notebook):

Overall my model has not done well on random test images despite the 1.9% error rate during training - I’m not sure if its because of the way the KDEF dataset was created or if my model is overfitted or something else? My error rate had a downward trend all along, barring few fluctuations, so I don’t think it is overfitted… but, would love clarification on this!

Now I am really interested in understanding what the model “learnt” and why certain images from the test set got mis-classified… I’ve been guessing myself crazy! But I need to better understand what I did so far before I try anything else.

The entire notebook is available here.
Give the app a go here!

Please please provide feedback/answers/questions… all of this is so new that it’ll be good to get validation on the approach and results. Thank you!

insiyeah · November 16, 2018, 9:55am

I just use CNN on music data for genre classification and used fastai library for transfer learning with an accuracy of 80%!
Have a look here.
Thank you!

joshfp · November 16, 2018, 8:48pm

Hi,
After lesson 4, I tried to combine tabular data with NLP, particularly in spanish.

I took a tabular dataset from an e-commerce marketplace with the objective of predicting products’ condition (new or used) based on listings’ features. It includes 100k records and after some data pre-processing (not included in the attached notebooks), I ended up with 30 features, including: 17 categorical, 12 continuous and 1 text field (listing’s title).

The process included:

Creating a tabular model without the text feature (accuracy: 91.5%).
Creating an NLP model to predict from the listing title:
2.2. Training a language model in spanish from scratch: I used a Wiki corpus trimmed to around 130 million tokens (training for 6 epochs tooks 10 hours on a GTX 1080TI, reaching an accuracy of 30.5%).
2.3. Appling ULMFiT: First training a domain language model (accuracy: 34.3%) and then classifier itself (accuracy: 81.5%). Then, the classifier was used to predict on the entire data set (probability of the product being new given the title).
Creating a new tabular model, this time adding as a new feature the prediction coming from the NLP model (final accuracy: 92.4%).

I tried also extracting the last linear layer’s activations (50) from the NLP model and feeding them in the tabular model, but it didn’t improve accuracy. Something that I didn’t reach to try was removing the output layer of both models, concatenating the outputs and feed it in a linear model (unlike my simpler model, this would backprop to both models).

In this case, the effort of training the NLP model (particularly the spanish model from scratch) just improved something below 1%. However, it was nice learning exercise and now I have a spanish pre-trained model, that hopefully will be useful for others projects thanks to ULMFiT.

gist.github.com

https://gist.github.com/joshfp/b62b76eae95e6863cb511997b5a63118

1.tabular.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model based on tabular data only"
   ]
  },
  {

This file has been truncated. show original

2.lm-spanish.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ULMFiT: Train spanish LM"
   ]
  },
  {

This file has been truncated. show original

3.nlp.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NLP model to predict from title (ULMFiT)"
   ]
  },
  {

This file has been truncated. show original

jeremy · November 16, 2018, 11:51pm

That’s a relative error improvement of >10%, which is a lot!

jeremy · November 16, 2018, 11:51pm

I’d be interested to see that, if you get it working…

matwong · November 17, 2018, 5:32pm

Hi,

I have an issue I want to know how to solve with deep learning. I have a list of “office hours” of service provider for homeless people in the city, usually in human readable format like:

“hoursOperation”: “Mon, Tue, Thu, Fri 8:30 am-11:30 am”,
“hoursOperation”: “Groups are held on the First and Third Wed 9:30 am-11 am”,
…
etc etc in other human friendly way …
…

and I want to translate those into something more machine friendly like:
{
“startTime”: “17:00”,
“endTime”: “18:00”,
“dayOfWeek”: [“Tue”, “Wed”],
},

what is the proper approach for this? I don’t think is classification. Would this be translation?

Thanks!
//

cwerner · November 18, 2018, 1:23am

Hi all.

I finally wrote up my blog post about creating the guitar classification model. In the previous days I decided to redo the exercise and incorporate the new data block API, progressive resizing and other goodies of fastai v1.
Please let me know if my description of the one-cycle-routine, progressive resizing, etc. is off.

I think the results came our real nice and I’m still amazed how good progressive resizing works!

Notebook and other links are included…

Next on the list are write-ups:

post about CAM visualization Share your work here ✅
detailed writeup of the flask app Share your work here ✅

Will be a little mini series in the end

oguiza · November 18, 2018, 1:40pm

Upgraded UCR Time Series Classification to image notebook

I’d like to share with you changes I’ve made to the OliveOil notebook I originally created based on some of the feedback received.

I’ve made the following updates gist:

Modified data source so that any of the 85 univariate UCD data sets can be used now
Added 3 new time series encoders
Modified the time series to image encoders so that images of different sizes can be created, independently of the time series length
Up to 3 encoder can be simultaneously used. Each encoder creates a single channel image, and a 3 channel image is created by combining them.
Incorporated the new data_block functionality

There are 7 image encoders available:

’Default’: raw time series
’Area’: time series area plot
’2D’: time series in 2D
RecurrencePlots: Recurrence Plot
GASF: Gramian Angular Summation Field
GADF: Gramian Angular Difference Field
MTF: Markov Transition Field

This is how the same time series would look like after an encoder is applied:

I’ve run many tests with this updated notebook. If you are interested, you can read the key learnings in the Time series/sequential data study group thread.

Meghana_G · November 18, 2018, 5:12pm

Hello Everyone!
We (@kranthigv) had planned to write a blog since a longtime and finally we did it.
Check it out!

bachir · November 18, 2018, 5:36pm

I tried to implement the embedding approach in the Collaborative Filtering from scratch in Keras. I got terrible results on the movielens 100k dataset
This is the accuracy of the model:
model_accuracy
This is the losses of the model:
model_loss

This is the Keras model:

num_factors = 5 # embedding dimentionality

# input
users_input = Input(shape=(1,))
items_input = Input(shape=(1,))

# embedding
user_weight = Embedding(num_users, num_factors, input_length=1)(users_input)
item_weight = Embedding(num_items, num_factors, input_length=1)(items_input)

# bias
user_bias = Embedding(num_users, 1, input_length=1)(users_input)
item_bias = Embedding(num_items, 1, input_length=1)(items_input)

# the collaborative filtering logic
res1 = Dot(axes=-1)([user_weight, item_weight]) # multiply users weights by items weights
res2 = Add()([res1, user_bias])                 # add user bias
res3 = Add()([res2, item_bias])                 # add item bias
res4 = Flatten()(res3)
res5 = Activation('sigmoid')(res4)              # apply sigmoid to get probabilities
# scale the probabilities to make them ratings
ratings_output = Lambda(lambda x: x * (max_score - min_score) + min_score)(res5)

model = Model(inputs=[users_input, items_input], outputs=[ratings_output])

I need to figure out what I missed to improve the model. All is detailed in this blog post. Any improvement suggestions?

SOVIETIC-BOSS88 · November 18, 2018, 8:54pm

Continuing our series of updates to our aircraft classifier project, I have added the Data Block API and progressively resized the dataset, from 32x32, to 64x64, to finally 128x128. We are now at 99.3% accuracy. Hooray.

Using the new model I have created this web app. Check it out at: deepair-v2.

I’ve written the following short Medium post describing the details of the process.

The accompanying notebook can be found at this gist.

flavioavila · November 18, 2018, 9:52pm

Hi, everyone.

I have been playing around with audio classification, using bachir’s strategy of transforming the audio signal into an image represents its spectrogram, and then performing transfer learning on those images using the guidelines from the first three lessons. I tried this with the dataset from the tensorflow speech recognition challenge from Kaggle last year (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/) and I got an interesting result. The dataset comprises short utterances containing commands such as up, down, stop, go etc. In my first trial, I excluded the categories unknown and silence to facilitate training.

The best result was superior to the first place in the private leaderboard of Kaggle 10 months ago. However, I’d need to include the unknown and silence categories to perform a fair comparison.

I also applied this same approach to emotion recognition from speech using the IEMOCAP database (https://sail.usc.edu/iemocap/). This database contains speech signals uttered by actors and labeled in categories such as sadness, happiness, anger and so on. I started with two classes with a decent amount of data (one thousand samples each) and the first results are encouraging: I got about 93 % accuracy differentiating between anger and sadness. I’m curious to see the performance for the entire dataset.

Cheers

dhoa · November 18, 2018, 10:11pm

I am working on an object verification problem that need to add new categories regularly. (similar to face verification, we first show the identity card and a model will verify if the face is matching)

I found 2 approach from here how-to-add-a-new-category-to-a-deep-learning-model . It says that we can retrain the model with old weight and adding this new category or using Content-based image retrieval. It means that we base on the last layer feature vector to decide the category (by calculating their hamming distance and euclidean distance). The paper of this technique you can find it here: Deep Learning of Binary Hash Codes for Fast Image Retrieval

Below is the image of concept of this technique:

I have tested the technique with Mnist data set, from number 0 to number 7 only. Number 8 and 9 will be add later. I use only the binarizing code of the last layer in this moment. I tested with number 9 and the results is quite ok. Every other numbers have low similarity (<60%), except number 7, it has 90% similarity with number 9. I will continue to test with Euclidean distance rather than this binary hamming distance.

I have written a blog about this here: Object Verification with Deep Learning of Binary Hash Codes
Source code is quite messy but if you are interested, you can find it here

I am very appreciated if someone can suggest me some techniques to deal with this problem. Thank you in advance

alvisanovari · November 19, 2018, 1:56am

All - I took a stab at the Amazon Bin Challenge i.e. count number of items in Amazon Pods: https://registry.opendata.aws/amazon-bin-imagery/

Here is the github: https://github.com/btahir/Amazon-Bin-Challenge

I have a python script in the repo that will download a subset of the data for you in case you want to try this challenge.

Here are initial results:

14%20PM