Kaggle Galaxy Classification

(Rob Harrand) #1

Hi all,

I’m currently on week 2 (part 1 deep learning), and have started to look at applying what’s been covered so far to the Kaggle ‘Galaxy Zoo’ challenge. The problem I have is that the training data, rather than having a binary classification, has 37 probabilities, reflecting different classifications and features, as per the competition guidelines. With the dogsvscats work, the training data was divided into folders that represented the class, but that’s not possible here. Has anyone tackled this issue and if so, could you give me any hints as to how to tell keras about these training labels, and how to consequently get the predict functions to output 37 different probabilities per image.

This is all new to me, so I’m sure I’m missing something pretty basic. Thanks.

(Florian Peter) #2

Hi @tentotheminus9,

We’re currently in the same boat. Decided to give it about 2 hours, before moving on to Week 3, and I didn’t get very far.

Found this thread with some interesting insights, and I’m also guessing that we need to replace the flow_from_directory method with something more handcoded.

One crazy out-of-whack idea I just had to make it work with our existing toolset: instead of using 37 categories, turn it into a (37*10)=370 categories problem (with subdirectories for each), approximating the “correct” probability/weight of each of the 37 categories in steps of 0.1
Might work for a basic submission, but obviously can’t be a very good solution :wink:

Did you make any progress?

(Sean Lanning) #3

flow from directory should work fine. Change your loss function to categorical cross entropy from binary cross entropy, your activation function to softmax instead of sigmoid, and the number of dense outputs at the end from 1 to 37.

(Florian Peter) #4

Thx for your reply! How would I organize the training data folder-wise, with flow from directory? And can you give me a hint why caterogical cross entropy is better than binary cross entropy for this type of multi-label classification?

(Alex) #5

All the images are in the same package, I just don’t understand how can I classify them to split in directories, how did you do that?

(Florian Peter) #6

Hey @Alexev,

part 1 (2018) (the first 3 lessons) makes all of this a lot easier with the new fastai library.
I would love to have another take at Galaxies now, if time permits.
Let me know if you have questions!

(Florian Peter) #7

Just found the time to play with Galaxies!

The ideas from the fastai notebooks work great here as well, already in top 50 and climbing. Will share my messy code here once done. Learning lots, especially as I had to play around with the DataLoader and metrics.

Ping me anytime if you can use some help getting started.