Thats great. And the monkey part, you’re not alone.
Brazilian jiu-jitsu or Judo? I thought it would be interesting to try a classification challenge that most humans would find very difficult. Practitioners of both sports/martial arts wear similar clothing (the gi), and are usually grappling in the photos. My aspiration was that the classifier may come to realize there are some very subtle differences, for example, judo practitioners spend more time standing and often throw from an upright position, whereas Bjj takes place in larger part on the ground. Please see the notebook here. I’d love to read any suggestions for image augmentation or further improvements.
Your LR at cell 17 appears too high - see how your validation results get very unstable? Try 10x lower for both LR numbers there.
Last week I got the CamVid-Tiramisu dataset over 95% accuracy. This was during the first training run. From a pretrained resnet-34 to 95% accurate in a little over an hour!
Can’t tell if I’m getting mediocre results or expecting too much. I’m trying to determine if a Louie Vuitton product is counterfeit. Since the bags come in so many shapes and sizes I focused on the classic brown monogram style. I pulled almost all the images from instagram and cropped out all the surrounding area to focus on the product. There’s 70 “real” bags/wallets and 51 “fakes”. I selected 25% of the images for the validation set. After tweaking the best result I can consistently get is a 37% error rate.
This is my first post, so please let me know if I did anything wrong.
Here’s a quick summary of what I did:
After watching lesson 1, I put together a model to tell apart pictures of Australian Prime Ministers (they change very quickly these days, so it seemed like it could be useful).
My data set included 50-70 images each of the last six PMs. I used resnet34, and the model got to an error rate of 12.5% after 4 epochs. My code is available on github.
It was really cool to be able to do this after just one lesson! (especially since I have limited coding experience - I’m currently a management consultant so most of my technical skills are specific to Powerpoint)
Here’s some detailed information of what I did:
Getting the data:
- To download the images, I used a firefox add-on from a forum post.
- I downloaded ~60-80 pictures of each prime minister from Google, then went through them and manually deleted any that looked problematic (e.g. pictures of the wrong person, pictures with text in them)
Uploading the data into Crestle:
- I had a bit of trouble with this, since it wasn’t obvious to me how to get a large number of files onto my Crestle server, but I was able to find some help in a forum post.
- I ended up installing the unzip module for python, uploading a zip file onto my crestle server, and then writing some code to unzip it
Building, running, and fine-tuning the model:
- I used resnet34 for the model (I also tried resnet50, but was running into memory issues even with a small batch size)
- After four epochs, the model got to a 12.5% error rate
- I was pretty impressed with the error rate! Especially since a lot of the prime ministers look very similar (many of them are men with glasses and short white/grey hair). The combination the model had the most issues with was Kevin Rudd and Scott Morrison
- I also tried some unfreezing, but the results seemed pretty bad (error rate of >80%). I’m guessing that I’ll learn more about how to do this in the coming lessons
Here’s what I did after the week 1 video and ipnb.
- setup kaggle on colab.
- downloaded the dog-breed-identification dataset that contains about 10,000 images of dogs belonging to 120 different classes
- trained a resnet-50 model on the unzipped dataset.
Had to figure a way to organise the data into folders based on class names.
- Got 81% accuracy after the first epoch
- Got to 89% accuracy after 8 epochs.
- really close to 90% accuracy by training after (learn.unfreeze())
Here are my top-losses:
The first photo contains both a briard and a whippet. The mistakes look aceptable.
I had tried training on the same dataset a month ago using transfer learning on a resnet-50 model on Keras. At that time the maximum accuracy I could get was about 80% after training for 10 epochs. This clearly shows the power of setting the parameters correctly. If using the default parameters, this result is achievable, I can’t wait to learn more and retry by tweaking the parameters a bit.
Here’s the link to my ipnb on colab. Do share your feedback and suggestions.
Using the image similarity from an earlier post in this thread, I tried to implement google’s X degrees of separation which is simply about finding a shortest path between two images. I was not very successful with the results but it’s overall OK like in this path ( path is from left to right with the image to the left is source and the image to the right is the target).
Looking at the t-SNE of those images they same to not be creating clusters, may be this is the problem
I tried to describe my approach in this quick blog post.
having some fun with ‘leela-zero’ running on gcp and playing against ‘leela-eleven’ …
both go-engines are available here:
front end is Sabaki gui :
deep-learning in production -sort-of!
Thank you Jeremy! Much improved:
Nice work !
A couple of questions for my own learning:
a) How did you select the image size ?
b) Why were the images not normalized ? (.normalize(imagenet_stats))
But one gotcha when comparing to prior art : the NABirds dataset has the train and test sets explicitly marked. But from the linked jupyter notebook, doesn’t look like this split is being honored. For fair comparison, would be good to train/validate on the training samples and then finally compare the accuracy using the test set.
I also pointed this out, actually on Twitter, probably should do this on this forum instead. JH somehow insisted this is ok. He said if this is not done with too much tuning. But honestly, I aint convinced. You can overfit to validation set simply by adjusting LR, as well as # of epochs you trained and who knows what else, all probably unintentionally. I backed down a bit but maybe i shouldnt. I think I have to call it out, and thanks for pointing this out as well.
To be fair to the researchers of that paper, @JCastle probably want to get a proper test set accuracy. I don’t mean to be downplaying anybody’s hard work here.
This is interesting. A friend of mine told me a startup doing exactly this, and it is getting global biz. This is a legit practical problem. As forgers are very good in faking product, your 37% may even beat lot of human, including me. So 99% probably not a benchmark to beat for now. I suspect you have to get more data, and try to get multiple shots on the same bag from different angle, and emphasizing diff parts, and average the predictions.
I am working on a project that involved counting, but nothing to do with fastai lesson. I am a bit confused since this looks to be an image recognition problem, and so wonder if you are labelling your images as “49”, “50”, “51”? But this will limit the range of object you can count. For my case, I have to use object detection model (which i think will be part 2 fastai)? You may have seen some videos such as yolo, ssd, etc. that draw bounding boxes around objects in some self-driving tech promo? a side product obviously you can count the objects. Potentially, you are not limited to the range of number of things you can count, up to a certain point.
Your project is really interesting. Could tell us more about how you created the dataset ?
I made a small notebook which determines whether a knife is present or not in an image. I wrote up a blog post on my method: https://hackernoon.com/recognising-a-knife-in-an-image-with-machine-learning-c7479f80525
I’d also be interested in how many images. I have a classifier that works across 16 sports, and I also had an ‘other’ option for non sports images. I had between 300 and 700 images per sport, depending on availability, and found that image size made a difference and ended up using 448x448 to get the detail needed to differentiate. Still I get around 95% accuracy. I’d expect two class baseball v football to easily get close to 100%. https://sportsidentifier.azurewebsites.net/ Quidditch was the last sport I added to my list.
I think this is just the row,column nature of tensors (and matrices generally) compared to the column,row (x,y) approach in images.
Thank you for your comments.
- I originally ran 4 epochs with either default image size or image size of 224 and of course started thinking of simple ways to improve upon the results. From (data.show_batch), I saw that a number of the images were cropped to a level that was hiding large important sections of the birds, so I increased the size to 384 and results improved 4-5%. Need to continue to research to make improvements.
- In the initial stages, I was looking to just get an output and keep things simple, so did not try (.normalize(imagenet_stats)) or additional functionality that FastAI offers. Will continue to explore.
- Agree 100% with your statement regarding the researchers train_test_sets. Again, initial hope for me was to just get reasonable predictions and was surprised to how close (+/- 1-2%) to state-of-the-art my initial attempt performed. Making this comparison is my next project.
@kechan - Thank you for your comments. I in no way wish to diminish the real work that the researchers performed.
My understanding is that FastAI’s accuracy rate is calculated on test holdout set that the trainer has never seen. So if the researchers performed their train_test splits properly and FastAI’s random test split is sound, then I believe the results should be fairly solid. However, you are absolutely correct that using the same test set to ensure that one of us is not working with an outlier test set is in order. This my next project.
Happy to answer any questions about my approach, but it is basically all in the notebook. I ran with resnet34 2-3 times, ran 2-3 times with different image sizes, and finally increased the number of epochs as hold out accuracy was still decreasing.