Planet Classification Challenge

layla.tadjpour · January 6, 2018, 9:13am

No, Jeremy told me to look at the code and I could not figure out what it does. I still think that TTA yields probability. If you exponentiate its output and look at the results, the values are not between 0 and 1.

ecdrid · January 6, 2018, 10:39am

That’s fixed

weiwei5444 · January 6, 2018, 11:04am

Sorry, I am a little confused. Did you mean that the TTA has been fixed to return probs instead of log_of_prob particularly in multi-label classification?

ecdrid · January 6, 2018, 11:22am

Read till end…

irshaduetian · January 14, 2018, 8:23am

Kaggle Planet Competition: How to land in top 4%

I was able to land in top 4% in this competition. I have written a blogpost about it.

It details all of the steps that can help you to land in top 4% Kaggle Planet Competition. Sharing it here, as it will be helpful to others.

Let me know what you think about it.

kimmoO · January 26, 2018, 2:05pm

Hi Layla,

I think you’re right. I managed to submit my results to Kaggle with TTA preds without applying np.exp() to my results. I got a score of 0.91777 from Kaggle which was slightly better than the 0.9165 score I got from running f2 on the validation data.

My Kernel also died when I tried to run TTA(is_test=True) with all of the test data but I circumvented this performance issue by dividing the test data into three folders with each around 20 000 test files. BTW, I’m using my own machine with a Nvidia gtx 1060.

layla.tadjpour · January 26, 2018, 7:28pm

Thanks for your comment. I (and some other people ) have another problem which is we get 0.84 percent on the test data. I wonder if there is an issue with my test data!! It seems it is working fine with yours.

irshaduetian · January 27, 2018, 12:47pm

For multiclass classification, there is no need to apply np.exp() to probabilities obtained from learn.TTA(), it is true in any situation where output activations are sigmoids.

kimmoO · January 27, 2018, 3:00pm

Strange, that happened to me also. I got a score 0.84 from Kaggle the first time I tried, but then I just repeated everything trying to isolate a possible mistake I might have made, and got a score of 0.91777 from Kaggle.

layla.tadjpour · January 27, 2018, 5:44pm

interesting but I get the same 0.84 score no matter how many times I repeat. when you say tried to isolate a possible mistake, what do you mean? did you look at individual wrong predictions?

kimmoO · January 28, 2018, 8:42am

I thought I might have messed up something when I repeated the same steps three times using three test folders to produce three result files which I then combined manually (to avoid my Kernel dying). The second time I tried I also ended up changing one thing: I did not apply data = data.resize(int(sz*1.3), ‘tmp’). I left this step out mainly because I wanted to simplify things a little bit (no need to copy the weights file from “data/planet/tmp/83/models” to “data/planet/models”) and also because I didn’t really understand this step. How could it be beneficial to first resize the files to size 64x64 and then immediately resize them to 83x83?

I don’t know if this is what made the difference. This is just what I did and what my thoughts were when trying this.

layla.tadjpour · January 30, 2018, 1:37am

I don’t think that made the difference. Jeremy said in the lecture that resize step is for speeding up the run time. I did not do that step and still get 0.85. I suspect it is due to my test files. Thanks anyways.

fero · January 31, 2018, 11:01am

Hey Amrit, did you end up resolving the “ValueError: Length of values does not match length of index” error? Debugging the same one at the moment

amritv · January 31, 2018, 10:17pm

Hey @fero it got fixed after the update mentioned in this thread, Kaggle Comp: Plant Seedlings Classification and then it worked fine. I am assuming you are using the latest fastai libraries?

fero · February 3, 2018, 1:41am

Yes, using the latest libraries and notebook but still getting the “list index out of range” error when using the following code to export the predictions, and “Length of values does not match length of index” after removing the ‘for’ loop (lines 4 and 5):

tta = learn.TTA(is_test=True)
test_fnames = data.test_ds.fnames

for i in range(len(test_fnames)):
    test_fnames[i] = test_fnames[i].split("/")[1].split(".")[0]

classes = np.array(data.classes, dtype=str)
res = [[" ".join(classes[np.where(p > 0.5)])] for pp in tta[0] for p in pp]
submission = pd.DataFrame(data=res)

submission.columns = ["tags"]
submission.insert(0, 'image_name', test_fnames)
submission.to_csv(PATH+"Planet_Sub.csv", index=False)

shubham827 · February 11, 2018, 2:09pm

I am following the lesson2 notebook for this as shown in lecture.
I want to understand the meaning of

from planet import f2

metrics=[f2]

from cell 5.
Is planet a package, as we’re importing f2?
What is f2?
I have seen the files on the contest but f2 isn’t there.

nas-r · February 11, 2018, 2:42pm

Hi all,

Like @layla.tadjpour I am running out of memory when running learn.predict(is_test=True) on the full test set
Is there a way of splitting the test data and running learn.predict on batches of images without splitting them into separate folders?

I’ve been trying to read through the library and figure it out, but my python is not there yet.

The best I get is:

predictions = []
for i in range(0,len(data.test_dl.dataset)-1):
    p = learn.predict_array(data.test_ds[i])
    predictions.append(p)

Which yields a NotImplementedError from T(a) in core.py

Alternatively, If I were to give up and go down the route of splitting it into separate folders, would I be creating another instance of ImageClassifierData, using ImageData, or the function set_data? (Or something else?)

EDIT:
I worked around the issue by reducing my batch size to 64. It would be nice to be able to set the batch size for training v. prediction independently. If anyone knows of a way to do this, I’m all ears!

nas-r · February 11, 2018, 2:50pm

f2 is a function located in the file planet.py.
You can find it in fastai/courses/dl1/planet.py

It is the metric used to assess performance in the competition.
see: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space#evaluation

If you’re confused about the difference between a loss function and a metric (I was) see:

sumo · February 17, 2018, 3:38am

Hi all,

Has anyone run the learning rate finder on kaggle planet dataset following the code in the lesson 2 notebook:

lrf=learn.lr_find()
learn.sched.plot()

It gives me the following error and I was wondering if anyone can suggest what i might be doing wrong?

ValueError: Expected array-like (array or non-string sequence), got
1 0 0 … 0 0 1
0 0 0 … 0 0 0
1 0 0 … 0 0 0
… ⋱ …
1 0 0 … 0 0 1
0 0 0 … 0 0 0
0 0 0 … 0 0 1
[torch.cuda.FloatTensor of size 64x17 (GPU 0)]

Kaushik · February 23, 2018, 3:52pm

Hi everyone,

I am having a small problem interms of predicting on the test data.

Firstly I downloaded the primary test jpeg dataset and the additional test jpeg dataset.
I moved the files from the additional test data folder to the primary test folder.

Now when i run the model and try to implement TTF(isTest = True), It is raising an exception telling me img-20xx is not found. I dont understand what is happening.

Can someone please explain how how we are supposed to prep the test data?