Planet Classification Challenge

amritv · December 20, 2017, 12:24am

That worked! but the drama keeps going on lol.

So your code worked but there is still the following error:

submission.insert(0, 'image_name', test_fnames)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-270-b3ff6658a8f4> in <module>()
----> 1 submission.insert(0, 'image_name', test_fnames)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas/core/frame.py in insert(self, loc, column, value, allow_duplicates)
   2609         """
   2610         self._ensure_valid_index(value)
-> 2611         value = self._sanitize_column(column, value, broadcast=False)
   2612         self._data.insert(loc, column, value,
   2613                           allow_duplicates=allow_duplicates)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   2758 
   2759             # turn me into an ndarray
-> 2760             value = _sanitize_index(value, self.index, copy=False)
   2761             if not isinstance(value, (np.ndarray, Index)):
   2762                 if isinstance(value, list) and len(value) > 0:

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas/core/series.py in _sanitize_index(data, index, copy)
   3119 
   3120     if len(data) != len(index):
-> 3121         raise ValueError('Length of values does not match length of ' 'index')
   3122 
   3123     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

I figured out that the index file is 1000 long but the values file is 5000 long.

[                                                   tags
 0                                   clear primary water
 1                                         clear primary
 2                                 partly_cloudy primary
 3                                                cloudy
 4                                         clear primary
 5                                         clear primary
 6                                         clear primary
 7                                   clear primary water
 8                                   clear primary water
 9     artisinal_mine bare_ground clear primary road ...
 10                                  clear primary water
 11                                   clear haze primary
 12                                        clear primary
 13                                          clear water
 14                          clear partly_cloudy primary
 15                                        clear primary
 16                                          cloudy haze
 17                                partly_cloudy primary
 18           agriculture clear cultivation primary road
 19                                        clear primary
 20                                        clear primary
 21                                        clear primary
 22                                   clear haze primary
 23                       clear cultivation haze primary
 24            agriculture clear habitation primary road
 25                                        clear primary
 26                                        clear primary
 27                                  clear primary water
 28                                               cloudy
 29                                          clear water
 ...                                                 ...
 4970                          agriculture clear primary
 4971                                      clear primary
 4972                                      clear primary
 4973                                        cloudy haze
 4974                                 clear cloudy water
 4975                              partly_cloudy primary
 4976                                clear primary water
 4977                     agriculture clear primary road
 4978                          agriculture clear primary
 4979          agriculture clear habitation primary road
 4980                                             cloudy
 4981                                      clear primary
 4982                                      clear primary
 4983                          agriculture clear primary
 4984                                      clear primary
 4985                                      clear primary
 4986          agriculture clear habitation primary road
 4987                                      clear primary
 4988                                clear primary water
 4989          agriculture clear habitation primary road
 4990                                      clear primary
 4991                                      clear primary
 4992                                      clear primary
 4993                    clear cultivation primary water
 4994          agriculture clear habitation primary road
 4995                                             cloudy
 4996                          agriculture clear primary
 4997                                      clear primary
 4998                                      clear primary
 4999                              partly_cloudy primary
 
 [5000 rows x 1 columns]

i used this code to help but still no luck

import resource
rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (1000, rlimit[1]))

weiwei5444 · January 6, 2018, 8:15am

I also meet this issue (the 0.48 result and a reasonable result if treating TTA() return as probability). Have you resolved this problem?

layla.tadjpour · January 6, 2018, 9:13am

No, Jeremy told me to look at the code and I could not figure out what it does. I still think that TTA yields probability. If you exponentiate its output and look at the results, the values are not between 0 and 1.

ecdrid · January 6, 2018, 10:39am

That’s fixed

weiwei5444 · January 6, 2018, 11:04am

Sorry, I am a little confused. Did you mean that the TTA has been fixed to return probs instead of log_of_prob particularly in multi-label classification?

ecdrid · January 6, 2018, 11:22am

Read till end…

irshaduetian · January 14, 2018, 8:23am

Kaggle Planet Competition: How to land in top 4%

I was able to land in top 4% in this competition. I have written a blogpost about it.

It details all of the steps that can help you to land in top 4% Kaggle Planet Competition. Sharing it here, as it will be helpful to others.

Let me know what you think about it.

kimmoO · January 26, 2018, 2:05pm

Hi Layla,

I think you’re right. I managed to submit my results to Kaggle with TTA preds without applying np.exp() to my results. I got a score of 0.91777 from Kaggle which was slightly better than the 0.9165 score I got from running f2 on the validation data.

My Kernel also died when I tried to run TTA(is_test=True) with all of the test data but I circumvented this performance issue by dividing the test data into three folders with each around 20 000 test files. BTW, I’m using my own machine with a Nvidia gtx 1060.

layla.tadjpour · January 26, 2018, 7:28pm

Thanks for your comment. I (and some other people ) have another problem which is we get 0.84 percent on the test data. I wonder if there is an issue with my test data!! It seems it is working fine with yours.

irshaduetian · January 27, 2018, 12:47pm

For multiclass classification, there is no need to apply np.exp() to probabilities obtained from learn.TTA(), it is true in any situation where output activations are sigmoids.

kimmoO · January 27, 2018, 3:00pm

Strange, that happened to me also. I got a score 0.84 from Kaggle the first time I tried, but then I just repeated everything trying to isolate a possible mistake I might have made, and got a score of 0.91777 from Kaggle.

layla.tadjpour · January 27, 2018, 5:44pm

interesting but I get the same 0.84 score no matter how many times I repeat. when you say tried to isolate a possible mistake, what do you mean? did you look at individual wrong predictions?

kimmoO · January 28, 2018, 8:42am

I thought I might have messed up something when I repeated the same steps three times using three test folders to produce three result files which I then combined manually (to avoid my Kernel dying). The second time I tried I also ended up changing one thing: I did not apply data = data.resize(int(sz*1.3), ‘tmp’). I left this step out mainly because I wanted to simplify things a little bit (no need to copy the weights file from “data/planet/tmp/83/models” to “data/planet/models”) and also because I didn’t really understand this step. How could it be beneficial to first resize the files to size 64x64 and then immediately resize them to 83x83?

I don’t know if this is what made the difference. This is just what I did and what my thoughts were when trying this.

layla.tadjpour · January 30, 2018, 1:37am

I don’t think that made the difference. Jeremy said in the lecture that resize step is for speeding up the run time. I did not do that step and still get 0.85. I suspect it is due to my test files. Thanks anyways.

fero · January 31, 2018, 11:01am

Hey Amrit, did you end up resolving the “ValueError: Length of values does not match length of index” error? Debugging the same one at the moment

amritv · January 31, 2018, 10:17pm

Hey @fero it got fixed after the update mentioned in this thread, Kaggle Comp: Plant Seedlings Classification and then it worked fine. I am assuming you are using the latest fastai libraries?

fero · February 3, 2018, 1:41am

Yes, using the latest libraries and notebook but still getting the “list index out of range” error when using the following code to export the predictions, and “Length of values does not match length of index” after removing the ‘for’ loop (lines 4 and 5):

tta = learn.TTA(is_test=True)
test_fnames = data.test_ds.fnames

for i in range(len(test_fnames)):
    test_fnames[i] = test_fnames[i].split("/")[1].split(".")[0]

classes = np.array(data.classes, dtype=str)
res = [[" ".join(classes[np.where(p > 0.5)])] for pp in tta[0] for p in pp]
submission = pd.DataFrame(data=res)

submission.columns = ["tags"]
submission.insert(0, 'image_name', test_fnames)
submission.to_csv(PATH+"Planet_Sub.csv", index=False)

shubham827 · February 11, 2018, 2:09pm

I am following the lesson2 notebook for this as shown in lecture.
I want to understand the meaning of

from planet import f2

metrics=[f2]

from cell 5.
Is planet a package, as we’re importing f2?
What is f2?
I have seen the files on the contest but f2 isn’t there.

nas-r · February 11, 2018, 2:42pm

Hi all,

Like @layla.tadjpour I am running out of memory when running learn.predict(is_test=True) on the full test set
Is there a way of splitting the test data and running learn.predict on batches of images without splitting them into separate folders?

I’ve been trying to read through the library and figure it out, but my python is not there yet.

The best I get is:

predictions = []
for i in range(0,len(data.test_dl.dataset)-1):
    p = learn.predict_array(data.test_ds[i])
    predictions.append(p)

Which yields a NotImplementedError from T(a) in core.py

Alternatively, If I were to give up and go down the route of splitting it into separate folders, would I be creating another instance of ImageClassifierData, using ImageData, or the function set_data? (Or something else?)

EDIT:
I worked around the issue by reducing my batch size to 64. It would be nice to be able to set the batch size for training v. prediction independently. If anyone knows of a way to do this, I’m all ears!

nas-r · February 11, 2018, 2:50pm

f2 is a function located in the file planet.py.
You can find it in fastai/courses/dl1/planet.py

It is the metric used to assess performance in the competition.
see: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space#evaluation

If you’re confused about the difference between a loss function and a metric (I was) see: