Share your work here ✅

I was interested in doing voice recognition detection. I used Audacity (https://www.audacityteam.org) to trim the audio from the following clips:

  1. Ben Affleck’s speech in The Boiler Room (https://www.youtube.com/watch?v=JfIKzReNDF4&t=62s)
  2. Joe Rogan and Elon Musk Podcast (https://www.youtube.com/watch?v=Ra3fv8gl6NE)

And used 3 min 30 seconds of audio voice from each of Ben Affleck, Joe Rogan, and Elon Musk.

I used a 5 second sliding window to plot their spectrogram, using the tutorial outlined here: https://github.com/drammock/spectrogram-tutorial/blob/master/spectrogram.ipynb

Since there was roughly 200 seconds of audio, that gave me roughly 40 spectrogram pictures each of each person.

Here is a sample of the spectrograms for each class (I am not sure why some of them are warped - my original pictures that are uploaded are not warped):

Despite the warping of these pictures, I moved on anyways to see what will happen.

I trained it on Resnet34 over 4 epochs (default settings) and got roughly 60% error:

So I decided to go with Resnet50. The error rate improved to 30% over 10 epochs:

So, 30% is not quite as low as some of the other work that we’ve been seeing on here, but I’m quite pleased with the results:

The model was pretty accurate with Ben Affleck and Elon Musk, while it was still better than random guessing for Joe Rogan.

I’d love to hear your thoughts on how I can improve the model. Obviously, I could add more training data - 40 samples each is probably too low (but this is a very tedious process to trim the audio to only a certain speaker and I might have run out of time for now). The warping picture issue is also concerning - not sure why that happened.

What do you think ? Otherwise, I’m pretty impressed that it did so well for Elon Musk and Ben Affleck for virtually zero tuning except to add epochs on Resnet50.

Because it did so well, I’m just convinced it will do much better on easier images :wink: Those spectrograms look very similar to the human eye!

Thanks for reading this!

36 Likes

I was talking about RAM, since disk size is not a problem (you can attach many terrabytes, virtually any size). I also tried only 1% of the data since I definitely agree on redundancy argument. Thank you for your replies!

Hi Jeremy, thank you for your explanation.

I was probably confused by the fact that I got memory error (not GPU memory) and when I took a look at the code I’ve jumped to conclusions too quickly. Unfortunately, I lost my logfiles from that run, so can’t check now what was going there.

Did you try disabling transforms?

Thanks for the suggestion - no, how do I do that ? :slight_smile:

From a discussion in another thread I looked at using activations to optimize an input and ended up implementing a deep dream sort of thing.

21 Likes

I used the course method for downloading images from Google to create a dataset of hotdogs, tacos, burgers, pizza, and fries. Must be time for dinner!

I got 94% accuracy right off the bat, with no cleaning/pruning of the data. I can see there are some errors in the training data. Here are the top losses. I wonder why it got #1 wrong? :slight_smile:

2 Likes

I was also looking into flower dataset, was not achieving good accuracy. Did you reformat images to 224x224 pixels?

I utilized the transfer learning method shown in Lesson-1 to train on crop leaves diseases identification using PlantVintage dataset. I wrote a blog about it, do check it out - https://medium.com/@aayushmnit/transfer-learning-using-the-fastai-library-d686b238213e

3 Likes

Hello guys))

I’m trying to work with birds dataset http://www.vision.caltech.edu/visipedia/CUB-200-2011.html,
which is a really fine-grained classification task (it’s pretty hard to differentiate all the kinds of sparrows with a naked eye). I was wondering what was the SOTA results on this data? I found two papers claiming the state of the art results. First one is from before deep learning era (https://arxiv.org/pdf/1310.1531.pdf). And their accuracy with manual feature engineering was only 64.96%. The second paper is from recent time and uses deep learning (https://arxiv.org/pdf/1807.07320.pdf ), with MA-CNN, their accuracy is 86.5%. This one is current SOTA to my best knowledge.

With fastai v1 library and almost no tuning, I did 76.4% accuracy on pretrained ResNet34 and 83.2% with ResNet50. I could probably overfit a little, but still, those are great results not only comparing with pre deep learning but also comparing with current SOTA.

Providing a reference to notebook in case if someone wants to take a look (https://github.com/ademyanchuk/course-v3/blob/master/nbs/dl1/lesson-1-birds-dai.ipynb)

Thanks to fastai team and all the community.

4 Likes

I attended Science Hack Day in San Francisco this weekend and used fastai to help build a Twitter bot that attempts to identify if a photograph is of a Cougar or not - part of a science communication hashtag game. You can see the bot in action at https://twitter.com/critter_vision/with_replies

My machine learning model is currently pretty terrible (I only got a 24% error rate - I’m certain I can do a lot better than that with more work) but that’s mainly because I spent most of the time figuring out how to deploy and run the resulting model as an API. I got that working, and I’ve just published some extensive notes on how I did that here: https://simonwillison.net/2018/Oct/29/transfer-learning/

14 Likes

I used resnet34 instead of resnet18. I got accuracy of 99.4203%.

I see we both have used the different data source. I have used kaggle data instead of uci

May be this is causing difference in accuracy.
I have to see how both data are different.

1 Like

I was interested at using satellite imaging to detect the amount of human presence in the Amazon forest. That could be used to find out how much of the forest is pristine, how much is in peril and trends over time.

I found a kaggle dataset that fit this perfectly, but it was multi-class and I thought I could make a better classifier if I focused only on the “human vs. forest” question so I basically grouped the classes like “road, habitation, etc” in “human” and “forest” as the rest. I was surprised at how easy it was to get something out of the ground and running with 7% error rate, here’s my gist:

I would appreciate any tips on how to improve it!

4 Likes

Hi Radek,

did you succeed in looking at most_confused data for your 1% of the whole data?
I am getting the error:
interp.most_confused(min_val=2000)


RuntimeError Traceback (most recent call last)
in ()
----> 1 interp.most_confused(min_val=2000)

~/anaconda3/lib/python3.6/site-packages/fastai/vision/learner.py in most_confused(self, min_val)
116 def most_confused(self, min_val:int=1)->Collection[Tuple[str,str,int]]:
117 “Sorted descending list of largest non-diagonal entries of confusion matrix”
–> 118 cm = self.confusion_matrix()
119 np.fill_diagonal(cm, 0)
120 res = [(self.data.classes[i],self.data.classes[j],cm[i,j])

~/anaconda3/lib/python3.6/site-packages/fastai/vision/learner.py in confusion_matrix(self)
91 “Confusion matrix as an np.ndarray.”
92 x=torch.arange(0,self.data.c)
—> 93 cm = ((self.pred_class==x[:,None]) & (self.y_true==x[:,None,None])).sum(2)
94 return to_np(cm)
95

RuntimeError: $ Torch: not enough memory: you tried to allocate 64GB. Buy new RAM! at /opt/conda/conda-bld/pytorch-nightly_1540719301766/work/aten/src/TH/THGeneral.cpp:204


RuntimeError Traceback (most recent call last)
in ()
----> 1 interp.most_confused(min_val=2000)

~/anaconda3/lib/python3.6/site-packages/fastai/vision/learner.py in most_confused(self, min_val)
116 def most_confused(self, min_val:int=1)->Collection[Tuple[str,str,int]]:
117 “Sorted descending list of largest non-diagonal entries of confusion matrix”
–> 118 cm = self.confusion_matrix()
119 np.fill_diagonal(cm, 0)
120 res = [(self.data.classes[i],self.data.classes[j],cm[i,j])

~/anaconda3/lib/python3.6/site-packages/fastai/vision/learner.py in confusion_matrix(self)
91 “Confusion matrix as an np.ndarray.”
92 x=torch.arange(0,self.data.c)
—> 93 cm = ((self.pred_class==x[:,None]) & (self.y_true==x[:,None,None])).sum(2)
94 return to_np(cm)
95

RuntimeError: $ Torch: not enough memory: you tried to allocate 64GB. Buy new RAM! at /opt/conda/conda-bld/pytorch-nightly_1540719301766/work/aten/src/TH/THGeneral.cpp:204

Congrats for your nice work. I have a question about using data sets on Colab. Does the folder that you define for Path (content/data/102flowers.mat) exist next to your notebook on Colab ?

Where are the 1,027 samples in your results / confusion matrix coming from? The test set is smaller than this, and I think about 20% of the training set is ~1,044.

Here’s my piece of work. I’m trying to classify government issued ID cards like driving licence, pan card(realated to income tax in India). I’ve used google_images_download to create my dataset

Currently I’ve two classes

  1. Driving Licence - 70 images that I was able to get from Google
  2. PAN Card - 25 images from Google

I used google_images_download to download the images to my local machine and properly label them. Once done, I’ve created a git repo on bitbucket and cloned the data into Paperspace storage using git.

Surprisingly, may be because of the less number of images, Google Colab was also able to successfully run the training.

What I wonder about the training is that, the error_rate kept on changing with the number of epochs

1 epoch
image
2 epochs
image
3 epochs
image
4 epcohs
image
5 epochs
image

Below are few of my doubts

  • Yesterday when I was running the same code, I got around 30-40% error rate. But today when I started the run, I initially got 11% and it finally settled at 5%. From the above screenshots, the error_rate started settling after i ran it for multiple times. Should the training ideally give the same error_rate for 1 epoch and 2 epochs. Why did it change to 11% for 2 epochs and fall back to 5% for 3 epochs.

  • One more question though. A pan card/ driving licence card look something like below, rectangular in shape.

PAN Card
image

Driving Licence
image

But when I was creating the DataBunch, these were transformed to 224 x 224 size resulting in a square piece. So the network ended up seeing only a part of the image instead of the full rectangular image. Should I be worried about this.

  • Also when the top losses are plotted, which I set to 9(default from the class notebook). There was only wrong classification. Why did the interpreter plot the correct ones as well ?

  • Visually, the two image classes I chose for the problem, are entirely different. I was expecting almost an error_rate of less than 1%, but here I am with an error rate of 5. Is it too simple a problem for the networks to work on ?

Looking forward to the feedback from fellow learners :slight_smile:

My end target for this project is to build a classifier that can classify official documents. I would also like to incorporate OCR to verify any forms that are submitted with details from these official documents.

Below is my notebook as a GitHub gist. I also made the BitBucket repo public. Feel free to use it.

3 Likes

I spent some time this past week building a dataset curator. It’s essentially two parts: scraper and curator.

The scraper grabs images from google image search based on your search phrase. It’s multithreaded and pulls down between 300-400 images per search in 30ish seconds on my laptop. It requires you install chromedriver somewhere on your file system and also the python package selenium.

The curator portion is an in notebook interactive session to locate duplicate/near duplicate images and garbage images in the downloaded data. It uses the intermediate layers of a pretrained vgg network and compares images based on the mean square error between their intermediate representations. If two images have similar representations then they’re probably very similar. It actually works really well! For garbage image detection, I look at those same intermediate representations and give images a score based on their total dissimilarity to all the other images in the set. The idea being that images that don’t belong will be most different from the actual “in class” images.

For both dup detection/garbage selection you’re presented with images in your notebook in order of their score (so most similar pairs will be shown first, etc) and asked to make a decision with a simple menu. The notebook cell clears its own outputs before presenting the next set of images so that your notebook doesn’t get too flooded.

For both processes since the images are shown in order, you will hopefully only have to go through a few pairs (for dup detection) or a few singles (for garbage removal) before you start seeing the kinds of images that you want. When this happens you can stop the process with the menu, and call purge to delete the marked bad files from your directory.

Here’s the code if you want to try it out (it’s not production quality so use at your own risk):
https://github.com/wfleshman/DatasetScraper

I used it to scrape images of Paladins (army howitzer) and Abrams (army tank) and built a classifier on that. I used to be an artillery officer and my family and friends would always mistake my Paladins for tanks. I was able to get around 94% accuracy by just training the new head of the conv learner on my curated dataset. So it looks like my family and friends should be ashamed of themselves :wink:

image

26 Likes

It’s my script that makes test set, it doesn’t divide in perfectly equal parts.

1 Like

I used same lesson-1 template with Resnet34 to classify hostel and hotel rooms. Dataset is created with by using Google Images Download and some images are removed manually to increase quality (final: 240x2 images). Accuracy is ~90.3%.

Some issues:

  • Chromedriver have to be download to scrape more than 100 photos with Google Images Download.
  • Some photos are bigger than limit. This causes to see “IOError: image file is truncated (nn bytes not processed)” in data normalize step.
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

Adding this code beforehand helps to continue to process with truncated image.

RESULT

fit


confussion_matrix

3 Likes