Sorry, I switched topic in th middle. I m talking about Gaussian blur as a great data aug when you try to do inference on mobile image input.
Ah, unfortunately I’m not doing any data augmentation yet. I have plenty of blurry data from my actual captures so it probably won’t be something I try in the near term.
I created a skin lesion classifier from the ISIC Skin Lesion Challenge data set. I can’t say it’s state of the art but I did get 94% accuracy on a 2,000 image holdout set. I’m really proud of it and how easy it was to get such great results so quickly. I’ll probably come back to this in time to try and improve upon it but I wanted to release it and see if anyone wanted to check it out and give any kind of feedback!
The notebook can be found here: https://github.com/DKilkenny/ISIC-2018-Skin-Lesion-Classification
I have classified the oxford based dataset this time for flower and got less accuracy, I believe my images were blur and data was not prepared correctly. Any thoughts and feedback is recommend.
had a look at your notebook. you could probably get better results by :
- experimenting with the arguments to get-transform (why are you setting filip off ?)
- let the training run for more epochs
After lesson 4, I was really keen to try building my own NLP / tabular / collaborative model.
As luck would have it, I got an email from Kaggle today announcing a new competition with tabular data. The competition requires you to use tabular data to predict whether Santander bank customers will buy products in the future. This sounded quite similar to the tabular example from the lecture, so it seemed like a good problem to try out.
- I built a tabular model, which had ~91.6% accuracy on my validation set (training with 4 epochs at a max learning rate of 5e-3)
- One challenge I ran into was that Kaggle scores this competition with an AUC-ROC metric (AUC = area Under Curve, ROC = Receiver Operating Characteristics). I tried to add this metric myself, and did a bit of Googling to try to find usable code, but wasn’t able to get it to work
- I submitted my model to Kaggle with a score of 0.862, which got me to position 678/1067 on the leaderboard. I might come back to this model in the future after I learn some more optimization techniques (e.g. I don’t know exactly what the ‘layers’ input does when you create a TabularLearner, so I just put in the same [200,100] value from the lecture)
- Code for the model is available at GitHub
did you first downloaded the data locally from analyticsvidya as i could not find a method to directly download into cloud …
@karanbangia14 Yes I did. The hackathon website has all the data.
plz explain briefly …how did you get the url …thats my question…pls help me
After watching the first two lectures, I was fascinated with the .fit_one_cycle() method and curious about its inner-workings. After reading Leslie Smith’s papers multiple times, I put my learning into writing:
I hope you find this useful!
when you come around to try data aug, you can test if the gaussian blur will help your accuracy, even if you have such sample in your training set. i suspect this will help if the digit is small.
I also have a few blurry ones in my own training set, but adding a little bit of gaussian blur during training can boost accuracy, sometimes up to 0.5%.
Follow up, do you be plan to write a more up to date “how to” to take your fastai trained pytorch model to apple .mlmodel, and maybe a skeletal app to try it out?
i think this will be really useful to see if what everyone has done actually generalizes in the real world, except may be the image is hard to obtain in the 1st place (like MRI or x-ray image).
I may come around to do this when I have a chance. Here’s what I am thinking
Provide a notebook tutorial on how to do this.
Structure this such that a single API call to the learner (or whatever model), will output the apple .mlmodel, if possible
If coremltools only work in 3.6, just make a note in notebook to ask people to switch to python 3.6 when ready to convert. I know this is ugly and turn off.
Provide a skeletal Xcode project for 1-label multi-class image recognition, such that you can just drop in the mlmodel and label.txt, and run.
If I get enough like here, I may put this slightly higher priority. Don’t want to spend too much volunteer time on stuff ppl don’t care about.
Just posted this short tour of data augmentation on GitHub.
I’m not sure which url you are talking about. If you are talking about the hackathon url then it’s here. I downloaded the data from analytics vidhya and uploaded it on Kaggle here. You can directly go to this link and select “New kernel” or you can fork my kernel and execute the code.
Nice to see someone working on the same dataset (HAM10000 contains the same dataI believe)! Just like you I see the validation loss sometimes going through the roof. Not sure how to interpret this, I made the assumption I put the wrong learning rate, but maybe this is not correct.
Did you pay any special attention to the class imbalance during the training? Or just run a lot of epochs? So far I got to an error rate of about 0.09, didn’t get to the balanced accuracy score yet.
Back at ya! I’m glad to see someone working on the HAM10000 dataset. I did have some issues with the validation loss on the 299x299 but I think that could be improved by using the weight decay tester but I was just getting a really good result with the 128x128 so I kind of stopped there. However, even with the 94% accuracy on the holdout set (and 89% accuracy on their validation set of which there is no ground truth), when I submitted to the competition, I got a 0.69 accuracy which didn’t make sense because for their set validation score to be 89% and then the test to be 0.69 just didn’t make any sense to me.
I’m assuming it’s something I haven’t accounted for in my model (and I’d really welcome anyones advice on that).
And to answer your question, it was mostly just running a lot of epochs and fine tuning the learning rate as I went along. If you’re not using weight decay, I would use it cause it really helped my model a bunch.
i was talking about the dataset as the uploading speed at my place is not good …and thanks for the upload did you try to increase your accuracy??
One of the things I noted is that you run the model on a very small validation set: 5%. If you run > 16 epochs with such a small subset, you risk overfitting on the training data and ‘luck’ that the validation set suits the training set.
Maybe I misinterpret your approach, but if the above is true, then consider a validation set of at least 20% of the data, try different seeds as well.
Hi all! I’ve created a classifier to detect different kinds of insects: https://insects.space (sorry, it’s in Russian ).
I’m getting good 99% accuracy on a dataset from Google Images (around 400 pics per class) but still when I add totally new pictures it sometimes gives wrong results. Assuming I need to add more classes and use bigger datasets for training.
Thanks to this great forum I created a Cuneiform classifier, that can distinguish between 50 logographs with 6.3% error (and I’m sure it can be improved)
Also most_confused() made a lot of sense. Check out LUGAL and LU2 from show_batch()… In Sumerian ‘LU2’ is man and LUGAL is king (great man)