I just finished comparing fastai’s tabular module vs a variety of newer baselines, you can read it here:
So I was working on Spam Classification project and trying several ML models like Random Forest, SVM, etc and the highest accuracy I could achieve was using Random Forest:
- Precision: 96.48%
- Recall: 95.36%
So I just tried the code from lesson 4 and the accuracy jumped to:
- Precision: 98.28%
- Recall: 99.61%
and here are a couple of predictions
False = Ham
True = Spam
I had some fun generating butterflies using a StyleGAN, created from a massive dataset available from the Natural History Museum in London. Of course, it was obligatory to publish the results to thisbutterflydoesnotexist.com. A bit about how I built it is posted here. Video of interpolation between 20 butterflies
Examples:
I’ve been working on investigating self-supervised learning which Jeremy recently posted about here.
As we’ve learned in the fast.ai course we always want to start our computer vision models with some understanding of the world instead of with random weights. For almost all vision tasks this means we start with a network that was pretrained on ImageNet. This is great and works very well lots of the time but there are some occasional problems with this approach. For example pretrained ImageNet weights don’t seem to work as well on medical images as they do on natural images.
This is probably because of how different the two are. For example:
Self-supervised learning is a set of techniques in which we try to train a network without labels on a pretext task so it will train faster and to a higher accuracy on a downstream task.
The pretext task I looked at is called “Inpainting” and it involves removing patches from images and training a neural network (eg. a U-Net) to fill in the missing patches. For example (removed patches highlighted in green):
After we train the network on this task, we take that same network and train it on a downstream task that’s actually important to us (classification, segmentation, object detection etc.). For my experiments I trained a network to do classification on the brand new Image网 dataset released by fast.ai. (More in my blog post about why Image网 is so useful for self-supervised learning research)
In the end I found that even with this simple pretext task we could get 62.1% accuracy on the classification task compared to 58.2% that we get when we train with random weights.
I’m planning to continue investigating the effects of different pretext tasks on downstream performance. The dream goal would be to find a task or collection of tasks that we could use to pretrain a network that would be competitive with pretrained ImageNet weights.
We implemented a pretty amazing 2019 paper for image similarity / image retrieval using fast.ai. It’s of much lower complexity than other state-of-the-art methods (e.g. no triplet mining required), as fast to train as regular image classification DNNs, and has results which are on-par or better than the best previous published results.
Repository: https://github.com/microsoft/computervision-recipes/tree/master/scenarios/similarity
This is great. I love work that shows progress on low complexity solutions.
Great work and super nice repo!
Hi everyone,
Based on week 2 notebook, I made a mushroom classifier… only for the most 10 common mushrooms. about 85% accuracy on test data. Seems to work decent on random mushroom images you can get online! Any feedbacks are welcome!
Base on Lesson2, and it works pretty well(I used 1920x 1080, I think I am wrong because I may not need to use this kind of a large size of image):
It can even work on the grew up version of Shio without having a grew up version in the training set.
Gthub
data_set_that_I_built
deployed and please feel free to upload your image: https://github.com/JonathanSum/Deep-Projects/blob/master/Character_idenf_deploy.ipynb
Hi, Hyungue Lim.
It is pretty interesting. Although I hope the accuracy to be higher, I will use your project to identify whether it is a poison mushroom or not.
But bad. I hope you will build more model in the future.
I tried your model, and it works pretty good.
Hello! Thanks for trying out!
Yes, I believe the accuracy could be higher with more data. Getting mushroom pictures was not as easy as I thought. There were not as much pictures of specific mushrooms I thought I would find and often, different mushrooms in a same picture.
It would be interesting to see how yours turn out to be!
Great news for the liver transplant business!
Joking aside, please check out my posts on deadly mushroom identification, and remember that many people naively believe that AI can do anything perfectly.
Hi
I just published my work on New York City Taxi Fare Prediction (Kaggle competition).
I used pure PyTroch for building a Tabular Model.
Take a look and pm for questions!
Hello!
Just finished week 2 lesson and wanted to try something myself.
I have trained a guitar (acoustic or electric) classifier with images from google search and got a respectable 96% accuracy.
Then I proceeded to deploy it on a server with a basic frontend.
Some gotchas along the way:
- Data cleaning is really important if your source is not very reliable (mislabeling, irrelevant images).
Got a ~5% increase in accuracy just by doing that. - Following the first point, even with the great widgets included in the Lesson 2 notebook, cleaning is time consuming expecially with a large dataset.
- When deploying remember to use the CPU only version of PyTorch if it’s only used for inference.
Dropped its size from 700MB to 100MB, useful for environments with limited resources. - Found a useful and powerful library in FastAPI, based on Starlette, as it’s easy to setup and uses async/await.
The Jupyter notebook is almost unchanged from the structure of the Lesson 2 one.
You can try the app here: https://agilulfo.herokuapp.com/static/guitars/
Source code on Github: https://github.com/agilul/deep-learning
Thanks fast.ai for this great course, will follow through the next lessons available!
If you want more interperetability out of your fastai tabular models, I’ve ported over SHAP
into fastai2:
I’ve now made this available via pip, you can do pip install fastshap
! The documentation is over at muellerzr.github.io/fastshap
Here is a basic output from a decision_plot
:
(big help from @nestorDemeure for the documentation and refactoring)
I’ve spent the last month or so exploring GANs (generative adverserial networks), and decided to write a detailed tutorial on training a GAN from scratch in PyTorch. It’s basically an annotated version of this script, to which I’ve added visualizations, explanation and a nifty little animation showing how the model improves over time.
Here’s the tutorial: https://medium.com/jovianml/generative-adverserial-networks-gans-from-scratch-in-pytorch-ad48256458a7
Jupyter notebook: https://jovian.ml/aakashns/06-mnist-gan
Thanks a lot for sharing this @JoshVarty!
The blog post and repo you’ve shared are excellent!
You’ve very clearly explained and demoed how you can use self-supervised learning using fastai v2. Great job!
There are still lots of questions to be answered, so please, keep sharing your insights.