I made a web app that understands if/how an image is rotated and “derotates” it. This idea came to me because when I take pictures with my phone, they don’t get a consistent orientation, I think because of the auto-rotate feature (or lack of it?)…
The code (including the training notebook) can be found on Github and the web app at derotate.appspot.com. I don’t know if the idea is useful in itself, but I didn’t find a lot of webapps that output images (and not a class like in fastai’s tutorial), so maybe it can be useful in that sense.
Eventually, I would like to make a web service out of it and call this function during an image processing pipeline, but I’m still a bit stuck on this step. Does anyone have a recommendation on how to do it?
Hi sebderhy thanks for an immensely useful app.
Seeing yours, all the great apps and things people create on this thread and all the great work done by the fastai team and the community is truly inspirational.
12-class sentiment classification of US Airline Tweets with standard ULMFiT - ~60% accuracy
Hello everyone! I’m really interested in deep learning for NLP, so I’ve been using it to train language models to do downstream tasks (document similarity, sentiment classification etc).
I had a go at this Kaggle dataset, and after relatively little training I got around 60% accuracy on 12 classes (positive, neutral, and 10 negative classes).
I’ve been playing with momentums and learning rates, but I never seem to be able to get much further. I wonder if anyone has any pointers as to how I could substantially improve this result?
Here’s another experiment on video enhancement / superresolution I’ve been working on recently (and highly enjoyed doing!).
The idea is that since a video is a small dataset, if we start with a good image enhancement model (for example fastai lesson 7 model), and fine-tune it on the video’s images, the model can hopefully learn specific details of the scene when the camera gets closer and then reintegrate them back when the camera gets further away (does this makes sense?).
Here is a screenshot of the first results I got (more results screenshots and videos can be found on my github repository):
In my experiments, the algorithm achieved a better image quality than the pets lesson 7 model, which seems logical since it’s fine-tuned for each specific video.
I actually initially posted this work on the Deep Learning section, because I feel like it’s not finished yet, and I’m looking for help on how to move forward on this. I haven’t found a lot of work on transfer learning in video enhancement (did I miss something?) so far, although it looks like an interesting research direction to me. Do you think that this kind of transfer learning in video enhancement has potential? If so, what would you do to improve on this work?
I recently wrote a Medium Article that I wish had been available when I started this journey. I feel like some of the questions addressed are encountered fairly frequently (and have even been addressed in this course).
Hoping this might be helpful to somebody and eager to continue to give back to the community that has given us this resource!
In my recent medium article, I wrote about a project in which I created a CNN based model to predict the exact age of the person given his / her image.
This is the link:
There are many new things I learnt while working on this project:
Reconstructing the architecture of ResNet34 model to deal with Image Regression tasks
Discriminative Learning Technique
Image resizing techniques
Powerful Image augmentation techniques of Fastai v1 library
As a test image to validate the prediction accuracy of my model, I used India’s PM Modi’s picture which was taken in year 2015 (when he was 64 years old) and checked the result:
Racket Classifier
Created my first GitHub entry, to create a classifier identifying Tennis, Badminton and Table Tennis rackets. I was surprised to get to 95% accuracy. The confusion matrix also makes sense that a few badminton and tennis rackets look similar in a few angles/crops.
PS: github also has the cleaned URL files if someone wants to replicate it.
This being my first GitHub entry, looking for experts to point out issues / mistakes / suggestions to make it better!
With a bunch of tree-friendly volunteers from Data for Good, we’ve been working for two months on a wildfire detection system! Following up on the increasing severity of forest wildfires across the globe this summer, we started interviewing firefighters and surveillance teams in southern France to gain some field expertise: with the adoption of cell phones, detection itself is not an issue anymore but early detection is crucial to contain the fires.
Existing approaches leverage high-end optical equipment but don’t make the most out of the processing part, whereas we believe that wider accessibility comes with lower deployment costs.
Our first draft is quite simple: train a reliable detection model, get it to run on Raspberry Pis and place those on existing surveillance towers.
Collecting data from publicly available images, we trained a single-frame classifier using the learning of first fastai lessons. We released a first version of the library earlier this week (available through pypi as well) including our image classification dataset and a light-weight model (mobilenet v2) with an error rate lower than 4.4%
The project is open-source and our goal is that anyone with a Raspberry PI (and its camera) can download and install the inference model easily at home completely free of charge.
We are always looking at expanding our datasets and improve the model so any feedback, suggestion or contribution is very much welcome
I wanted to create a craft-beer identification network which would tell me quality/rating of a beer from an image. I realised early on that I couldn’t just use the lesson 2 classifier, because this problem requires not just image classification, but segmentation too. When there are multiple craft beers in the frame, I need to return different predictions to different coordinates.
The way I solved this was, first using this pre-trained pytorch implementation of YOLOv3, to segment and draw bounding boxes around the 1000 (possible) classes in imagenet.
Then, if the detection class = 39 (a bottle), I would pass this cut-out to the custom trained FastAI resnet model, and display the results as an overlay on the original image.
The code works well, but of course there’s the limitation that it only classifies beers that I’ve already looked up and created a dataset for. I can imagine some kind of future work which automatically adds brands to the model by scraping Google Images based on a master-list, then using YOLOv3 to extract bottles from the search results, and then running those images in training.
Also I have a short video of it running on my GitHub, but it would need some considerable refactoring to make it actually run in real-time (and beyond the scope of this hobby project)
This is so cool… I think this gives me some answers I have been looking for (multi object in single frame and also “I don’t know” answer from classification).
I am going to try this and bother you if I get stuck.
I want to share with you a project in which I have been working last weeks. It is a Mushroom identifier web app (based on Shubham Kumar repo) that uses a resnet34 model to make the predictions.
The dataset has about 8000 images of 43 mushroom classes, and the achieved accuracy of the model is ~90.4%. I think it’s quite a good value taking into account the difficulty of image recognition for mushrooms. The majority of the most confused mushrooms would probably not be correctly identified by expert mushroom hunters based in a single image!
When showing the results, I would like to show which are the mushroom classes that use to be confused by the predicted class, but I didn’t found a way to do it.
If you have any suggestion, comment or doubt about the project I will be happy to hear you!
Any basketball lovers out there? I created an iPhone app that I now use to keep track of my shooting progress. All I do is attach my phone (iPhone 7) to a tripod at the gym and it locates where I shot from and whether it went in or not.
Hi Jordi. This is a cool project, and one that is close to my kitchen and heart! 90% accuracy seems very good for a single photo. As you noted, the other 10% might kill you.
When showing the results, I would like to show which are the mushroom classes that use to be confused by the predicted class, but I didn’t found a way to do it.
You are very close to what you ask for. I did this a year ago with imagenet categories, so please forgive me if my memory is not entirely accurate.
This cnn model outputs activations for the 43 classes. fastai automagically applies softmax activation and nll_loss to these activations. I am not sure how well this invisible process is documented, but you can see it by tracing fastai with a debugger.
So first define your own loss function that does the same as fastai and assign it to learn.loss_func. This assignment prevents fastai from automatically deducing the correct activation and loss functions. In your loss function, between softmax and nll_loss, you will find the probabilities for each class. Then you can list the probabilities of the most likely classes.
Note that these class probabilities are relative to each other. They will tell you, given the image, which classes are most likely, but they will not tell you that there is no mushroom present of any class. For that, you would need to train with sigmoid activation and set a threshold. I make this comment only because it is a recurring question on the forums that has not been clearly and definitively addressed.