Initial guidance on Kaggle Galaxy Zoo Competition -

(Partho P. Das) #1

Hey folks,

I am a n00b and just got done with ch2 dogbreed exercise.

The Galaxy zoo training data set contains a list of galaxy’s and the corresponding probabilities of each belonging to one of the 37 classes. Here is a sample:

GalaxyID Class1.1 Class1.2 Class1.3 Class2.1 Class2.2 Class3.1 Class3.2 Class4.1 Class4.2 Class5.1 Class5.2 Class5.3 Class5.4 Class6.1 Class6.2 Class7.1 Class7.2 Class7.3 Class8.1 Class8.2 Class8.3 Class8.4 Class8.5 Class8.6 Class8.7 Class9.1 Class9.2 Class9.3 Class10.1 Class10.2 Class10.3 Class11.1 Class11.2 Class11.3 Class11.4 Class11.5 Class11.6
100008 0.383147 0.616853 0 0 0.616853 0.038452149 0.578400851 0.418397819 0.198455181 0 0.104752126 0.512100874 0 0.054453 0.945547 0.201462524 0.181684476 0 0 0.0272265 0 0.0272265 0 0 0 0 0 0 0.279952491 0.138445328 0 0 0.092885571 0 0 0 0.325512248

I am unable to fit this into the dogbreed pattern. viz. A dog belonged to just one breed. A galaxy can however belong to multiple classes.

How do I go about modeling this problem?


(Vikas Bahirwani) #2

That is a good question. I just came across this dataset too. If you search for “galaxy” on the forums you will find an old thread with some insights.

I realized that you should look at lecture 3 and one hot encoding to solve this problem. I will come back to this dataset after I do that.

(Partho P. Das) #3

Yep - this is single vs multi label classification. Which is covered in week 3 (understanding amazon problem)