For those of you having a hard time getting started with the Kaggle challenge + FASTAI library, I’ve worked with Prince to put together a “starter kit” for processing the Iceberg matrices into RGB images, then running your first Convnet. Hopefully this will give you a starting place to start tuning and other image techniques.
*11.14.17: Updated the code for proper Kaggle submission formatting.
saved the test images with the id build in such as img_ab2348fed28.png
then after running the FastAI model, pull the file names + probabilities
then extract the ids from the file name + package with the probabilities
Caveats about the Code
Based on Fastai library
Was authored on a Paperspace instance + GPU, one model ~ trains 10 mins
100% Markdown: Also, since kaggle can’t run fastai, the notebook on kaggle is 100% markdown, which makes copying code a little more difficult.
Also, maybe try pretraining on scaled down versions of the planet data or similar. And think about how best to do data augmentation, since flipping etc doesn’t work right with the angle data in the iceberg dataset, as you may have noticed.
I kinda got left behind in the Dog Breed Challenge competition. Definitely will get on this.
Would you mind me giving it a stab to get the SeNet piece working? Also, what did you exactly mean to try get it (SeNet) to work? Did you mean simply to try integrate that to the fastai library? With a meagre attempt to do the same for VGG-16, I just might be able to do that. Or did you mean, literally train the SeNet on image-net and publish the weights, so we can have a pretrained model to work with?
Also, my GPU’s being sitting idle for some days and surely eager to crunch some data (think training on imagenet )
Sorry… didn’t get my first post quite right. I was able to use VGG-16 just fine. Think Jeremy actually pushed in a feature update himself.
But will let you know what happens with the SegNet architecture.
I’ve been reading PyTorch forums, people seem to have issues with saving and loading models. Shouldn’t it be as easy as saving the trained model parameters/weights into a pickle like file as it’s mentioned in PyTorch then load and use it in any kind of environment such as fastai ?
And why don’t people just share those serialized files trained on different datasets with the best model ?
We don’t have a pretrained CIFAR-10 model, but we’d love one - or many! So if you do train one or more of those models I’d be happy to host the weights on our web site.
“Why don’t people share the serialized files?” I have no idea - it’s a huge opportunity that no-one is taking advantage of, other than a few imagenet files. There should be pretrained nets available for satellite, medical (CT, MRI, etc), microsopic (cell) images, etc, but there aren’t any…
I haven’t heard of problems with saving and loading models on the whole, although I know that if you train on multiple GPUs you can’t load on a single CPU, and visa versa.
@jeremy I modified the dogs & cats to accomplish this along with the “Starter Kit” Thanks @timlee
I think that the TTA is essential on this challenge.
But I think we need to take in consideration rotation of the images. not just 180 or 90 degrees like TTA does with the standard 4 options.