Hey that’s cool. Did you have much trouble getting the model converted to onnx? Really curious if you’re able to go from onnx to CoreML on iOS!
I was also thinking about grey scale images.
in RGB we have three channels 1
in Greyscale, we have 1 channel
Should we do something different in Greyscale images ?
This is really cool. I was wondering if visualising the layers show what makes the countries distinctly them!
I am playing with the download_image.ipynb. I have write some script to automatically scrape Google Image by passing a dictionary of search list instead of manually typing in Google.
I have encounter some problem and would love some help:
- The download_images progress bar often stuck at 99%, I am not sure why. I have to use max_worker=None to avoid this but it is also much slower as a result.
- I love the little JS script from @lesscomfortable to scrape Google Image. But I want to do this in Python script. I use request to get html from Google Image, it does not get as much image as manually scrolling down and press the “show more result” button, I am not proficient at web scraping, would love some help here how can I mimic a web browser and doing these scrolling and pressing work for me with python script.
You could “triplicate” the 1 channel, to get a “pseudo” colour image.
My homeland of Trinidad and Tobago is known as “The Land of the Hummingbird”. So I decided awhile ago to try to build an image classifier for the 17 recorded species of hummingbirds found there. The problem was data, thanks to lesson 2 downloads notebook, this was solved. I adapted the notebook initially for just 3 species and was getting about 25% error rate, which was about the same when compared to my results in a notebook using the Birds Species dataset so things were looking promising.
However this got worse to about 37% error rate when I was up to 9 species. This notebook shows this run with unpruned data from Google for hummingbird species.
The errors in the images Google was retrieving, was not in non-images of birds, but lay in the wrong species type being retrieved from Google Images, so the FileDelete tool wasn’t quite apt for pruning in this case as it only showed the image, not how it was classified.
However, once the data was pruned, I also sub-divided species into male and females for some species where there were clear distinguishing characteristics between the two genders. So I ended up with 14 categories with pruned data and got back to about 25% error rate. This notebook shows pruned data with 14 categories of hummingbird by species and in some cases gender.
After seeing @simonw post and exploring his source code I pushed my own model into a similar docker image and deployed it to an azure website at https://hummingbirds.azurewebsites.net/ if anyone wants to try it out for themselves.
Hopefully as I explore more I can pretty up the UI for results a bit, and add the remaining species as I build the pruned training dataset over time.
Thanks to @sparalic for inspiring me to have enough confidence to share my work too as she shared hers. A re-edit of this post is also published as my blog post for this week’s learning.
I create a dataset from picture-based subreddit all-time top post to predict the photo will be liked the most by which community. I use praw to get the link and write it to
.txt file the same format as shown in class.
List of subreddit included in dataset
Using resnet34, I’m able to get ~86% accuracy by following an example in class. After that I try again with resnet50 and the accuracy improve to ~87%.
here is the link to the notebook: 10 picture-based subreddit.ipynb
Thank you I’m actually more interested how 1 channel is interpreted by model and how is affecting learning ?
@etown I have gotten it to convert from pytorch to ONNX, but I got an error (not supporter Gather) while trying to convert to CoreML. Where were you bottlenecks?
Cool! I’d love to see a walk-thru on your blog of exactly how you created that Azure website
Has anyone successfully converted an ONNX model to CoreML for an iOS mobile app? I’ve spent a few hours with little to no luck.
Here you go, hopefully it’s sufficient to get others trying it out. https://redditech.blog/2018/11/04/hosting-fastai-app-in-azure-websites-for-containers/
Hi, I have made a healthy vs junk food detector app.
I am able to get between 85-90% accuracy even though the data which I downloaded off google is quite noisy.
One difference between this and other classification tasks, for e.g. different dog breeds, is that even though two categories can look same, the output is singular, i.e. it can be this or that. Never a mix of the two. When it comes to food, that boundary can be blurred as it can be a bit of both.
I started off with ~500 of each category using query like “health food dishes -junk” and “unhealthy food dishes -healthy”. With some little cleanup I got around 90% accuracy. However it had a limited view of food items and mostly consisted of your regular junk food like burgers, fries etc on one hand and salads on the other. So next I consciously picked 4 different cuisines, namely, american, italian, indian and chinese and download healthy and junk food images for each of them. Next up I added some sweets in the junk food category and greens in the healthy category. Even with that I’m able to keep the accuracy between 85-90%, which to me is quite good.
I was worried if the model was too biased with the green color, so I picked out couple of green junk food. First was this avacado burger and to my surprise it classified it correctly. Perhaps it is more biased towards burgers Next I gave a green cupcake, and it failed to identify it correctly. I noticed that in my training data, I did not have images of cup cakes. So it’s just a matter of adding the right set of data and the model will somehow magically extend.
Here’s the confusion matrix with 3500 odd healthy food images and 2400 junk food images:
And here are some of the top losses:
While some of them may be unclear even to us whether it is healthy or junk, few others are definitely mis-classified.
For e.g. I have labeled few pop-corn images as healthy and this one I left it as junk since I think it is caramelized. The 3 in middle column are definitely mis-classified.
I have taken @simonw 's code and enhanced it to deploy on Heroku. Here are a few screenshots:
Lastly a big thanks to Jeremy & Rachel along with other folks who made this course what it is today. I had done the first few chapters of v2 course and I can say v3 is really awesome and this thread is testimony to that. Cheers.
I have got windows 10 home edition. Is it possible to use docker CE on it? It is suggesting me to use docker toolbox and I am unable to proceed with that.
Technically one can run Docker CE on Windows 10 but considering you want to deploy it on Heroku (or any other platform), I’d suggest use a linux installation. Docker on windows can only run windows apps. Use your google account to run on GCP with $300 worth of free credits. Best way IMO.
Great work! Nice to see an improvement in accuracy on the previous good result. What do you atribute this to? Superconvergence? Better data augmentation?
Thanks for providing the code on Github! I had to change a few things to deploy it on my local Ubuntu machine (read data from csv, change interface) without an docker installation. Just run
and now you can predict plant leaf types via the web. This is awesome! =)
I believe that the onecycle policy was the main responsible for the improvement. I don´t know if the new fastai has a different augmentation feature, but if this is the case, then it sure helped too.
Yes - agreed that some kind of deeper exploration would be interesting; visualizing the layers or even something like this distill.pub article
Right now I still have a problem with my train/val split and major leakage.Will fix first then explore those solutions. Also if you have any other ideas I’m a taker!
Superconvergence? Explain pls