I also meet this issue (the 0.48 result and a reasonable result if treating TTA() return as probability). Have you resolved this problem?
No, Jeremy told me to look at the code and I could not figure out what it does. I still think that TTA yields probability. If you exponentiate its output and look at the results, the values are not between 0 and 1.
Sorry, I am a little confused. Did you mean that the TTA has been fixed to return probs instead of log_of_prob particularly in multi-label classification?
Read till end…
Kaggle Planet Competition: How to land in top 4%
I was able to land in top 4% in this competition. I have written a blogpost about it.
It details all of the steps that can help you to land in top 4% Kaggle Planet Competition. Sharing it here, as it will be helpful to others.
Let me know what you think about it.
I think you’re right. I managed to submit my results to Kaggle with TTA preds without applying np.exp() to my results. I got a score of 0.91777 from Kaggle which was slightly better than the 0.9165 score I got from running f2 on the validation data.
My Kernel also died when I tried to run TTA(is_test=True) with all of the test data but I circumvented this performance issue by dividing the test data into three folders with each around 20 000 test files. BTW, I’m using my own machine with a Nvidia gtx 1060.
Thanks for your comment. I (and some other people ) have another problem which is we get 0.84 percent on the test data. I wonder if there is an issue with my test data!! It seems it is working fine with yours.
For multiclass classification, there is no need to apply np.exp() to probabilities obtained from
learn.TTA(), it is true in any situation where output activations are sigmoids.
Strange, that happened to me also. I got a score 0.84 from Kaggle the first time I tried, but then I just repeated everything trying to isolate a possible mistake I might have made, and got a score of 0.91777 from Kaggle.
interesting but I get the same 0.84 score no matter how many times I repeat. when you say tried to isolate a possible mistake, what do you mean? did you look at individual wrong predictions?
I thought I might have messed up something when I repeated the same steps three times using three test folders to produce three result files which I then combined manually (to avoid my Kernel dying). The second time I tried I also ended up changing one thing: I did not apply data = data.resize(int(sz*1.3), ‘tmp’). I left this step out mainly because I wanted to simplify things a little bit (no need to copy the weights file from “data/planet/tmp/83/models” to “data/planet/models”) and also because I didn’t really understand this step. How could it be beneficial to first resize the files to size 64x64 and then immediately resize them to 83x83?
I don’t know if this is what made the difference. This is just what I did and what my thoughts were when trying this.
I don’t think that made the difference. Jeremy said in the lecture that resize step is for speeding up the run time. I did not do that step and still get 0.85. I suspect it is due to my test files. Thanks anyways.
Hey Amrit, did you end up resolving the “ValueError: Length of values does not match length of index” error? Debugging the same one at the moment
Yes, using the latest libraries and notebook but still getting the “list index out of range” error when using the following code to export the predictions, and “Length of values does not match length of index” after removing the ‘for’ loop (lines 4 and 5):
tta = learn.TTA(is_test=True) test_fnames = data.test_ds.fnames for i in range(len(test_fnames)): test_fnames[i] = test_fnames[i].split("/").split(".") classes = np.array(data.classes, dtype=str) res = [[" ".join(classes[np.where(p > 0.5)])] for pp in tta for p in pp] submission = pd.DataFrame(data=res) submission.columns = ["tags"] submission.insert(0, 'image_name', test_fnames) submission.to_csv(PATH+"Planet_Sub.csv", index=False)
I am following the lesson2 notebook for this as shown in lecture.
I want to understand the meaning of
from planet import f2
from cell 5.
Is planet a package, as we’re importing f2?
What is f2?
I have seen the files on the contest but f2 isn’t there.
Like @layla.tadjpour I am running out of memory when running learn.predict(is_test=True) on the full test set
Is there a way of splitting the test data and running learn.predict on batches of images without splitting them into separate folders?
I’ve been trying to read through the library and figure it out, but my python is not there yet.
The best I get is:
predictions =  for i in range(0,len(data.test_dl.dataset)-1): p = learn.predict_array(data.test_ds[i]) predictions.append(p)
Which yields a NotImplementedError from T(a) in core.py
Alternatively, If I were to give up and go down the route of splitting it into separate folders, would I be creating another instance of ImageClassifierData, using ImageData, or the function set_data? (Or something else?)
I worked around the issue by reducing my batch size to 64. It would be nice to be able to set the batch size for training v. prediction independently. If anyone knows of a way to do this, I’m all ears!
f2 is a function located in the file planet.py.
You can find it in fastai/courses/dl1/planet.py
It is the metric used to assess performance in the competition.
If you’re confused about the difference between a loss function and a metric (I was) see:
Has anyone run the learning rate finder on kaggle planet dataset following the code in the lesson 2 notebook:
It gives me the following error and I was wondering if anyone can suggest what i might be doing wrong?
ValueError: Expected array-like (array or non-string sequence), got
1 0 0 … 0 0 1
0 0 0 … 0 0 0
1 0 0 … 0 0 0
… ⋱ …
1 0 0 … 0 0 1
0 0 0 … 0 0 0
0 0 0 … 0 0 1
[torch.cuda.FloatTensor of size 64x17 (GPU 0)]