Kaggle Comp: Plant Seedlings Classification

NathanYee · January 11, 2018, 12:11am

Had a great time with this competition. I ended up just beating Jeremy after thinking a little more about how to fine-tuned the network before unfreezing the network and enabling differential learning rates. My score ultimately ends up tying for 22nd on the public leaderboard!

Good luck to everyone!

ecdrid · January 11, 2018, 2:38am

Congratulations

Renga · January 20, 2018, 7:30am

I created a small python to create validation directories and move 20% of the files to validation set. Feel free to modify the PATH variable and the % as you like.

ecdrid · January 20, 2018, 7:48am

Attach the source code here?

divyansh · January 20, 2018, 8:31am

Did you find a solution to adding it to learn.fit()

Renga · January 20, 2018, 9:04am

https://github.com/Renga411/dl1.fastai/blob/master/Validation-set-creator.ipynb

ianianian · February 4, 2018, 5:32pm

Hi @shubham24 ,

Apologies if this question is slightly off topic, but for:

glob(“train/**/*.png”)

in the script you wrote, am I able to use a token like {PATH} with glob in case I dont want to run the script in the same directory?

I tried:

for image in glob("{PATH}train/**/*.png"):

and

for image in glob(“f’{PATH}train/**/*.png”):

but both did not work.

In other words, how can I change the script you wrote to allow me to run it in any directory? This is so that I can just leave it in a single directory and be able to use it for different kaggle datasets.

Ian

shubham24 · February 5, 2018, 1:18pm

@ianianian,

I think you’re using the wrong syntax.
Try:
for image in glob(f'{PATH}train/**/*.png'):
You may want to read this post: https://cito.github.io/blog/f-strings/

ianianian · February 5, 2018, 4:28pm

@shubham24

Thank you for the fix and pointing me in the right direction to learn more.

Appreciate it!

raja4net · February 6, 2018, 12:41pm

Hi! I am getting an error whenever i run a function to calculate lean value.

learn = ConvLearner.pretrained(arch, data)
lrf=learn.lr_find()
learn.sched.plot()
Please help in resolving this error. Error log is below:

TypeError Traceback (most recent call last)
in ()
----> 1 lrf=learn.lr_find()
2 learn.sched.plot()

~/fastai/fastai/learner.py in lr_find(self, start_lr, end_lr, wds, linear)
250 layer_opt = self.get_layer_opt(start_lr, wds)
251 self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr, linear=linear)
–> 252 self.fit_gen(self.model, self.data, layer_opt, 1)
253 self.load(‘tmp’)
254

~/fastai/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, use_clr, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, **kwargs)
154 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
155 return fit(model, data, n_epoch, layer_opt.opt, self.crit,
–> 156 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
157
158 def get_layer_groups(self): return self.models.get_layer_groups()

~/fastai/fastai/model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
104 i += 1
105
–> 106 vals = validate(stepper, data.val_dl, metrics)
107 if epoch == 0: print(layout.format(*names))
108 print_stats(epoch, [debias_loss] + vals)

~/fastai/fastai/model.py in validate(stepper, dl, metrics)
126 preds,l = stepper.evaluate(VV(x), VV(y))
127 loss.append(to_np(l))
–> 128 res.append([f(preds.data,y) for f in metrics])
129 return [np.mean(loss)] + list(np.mean(np.stack(res),0))
130

~/fastai/fastai/model.py in (.0)
126 preds,l = stepper.evaluate(VV(x), VV(y))
127 loss.append(to_np(l))
–> 128 res.append([f(preds.data,y) for f in metrics])
129 return [np.mean(loss)] + list(np.mean(np.stack(res),0))
130

~/fastai/fastai/metrics.py in (preds, targs)
11
12 def accuracy_thresh(thresh):
—> 13 return lambda preds,targs: accuracy_multi(preds, targs, thresh)
14
15 def accuracy_multi(preds, targs, thresh):

~/fastai/fastai/metrics.py in accuracy_multi(preds, targs, thresh)
14
15 def accuracy_multi(preds, targs, thresh):
—> 16 return ((preds>thresh)==targs).float().mean()
17

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/tensor.py in eq(self, other)
346
347 def eq(self, other):
–> 348 return self.eq(other)
349
350 def ne(self, other):

TypeError: eq received an invalid combination of arguments - got (torch.cuda.FloatTensor), but expected one of:

(int value)
didn’t match because some of the arguments have invalid types: (torch.cuda.FloatTensor)
(torch.cuda.ByteTensor other)
didn’t match because some of the arguments have invalid types: (torch.cuda.FloatTensor)

ecdrid · February 6, 2018, 1:02pm

It seems(not sure at all) that the error is with the lib as this error arises when the Variables that we pass into the function aren’t of the same type,

Also which OS are you running?

raja4net · February 6, 2018, 1:12pm

I am using fastai machine on paperspace.com
But I am running it from Windows 7 PC in Cygwin. Will this make a difference?
EDIT: I have run notebook on Macbook too and the result is same. Any thoughts?

rsevs3 · February 9, 2018, 6:53am

I was having this problem as well.

Turned out the issues was that I wasn’t replaces the ’ ’ in the species names.

See here for the solution:

raja4net · February 9, 2018, 7:30am

Edit: After removing spaces from species name it worked like a charm. Thanks!

alessa · February 22, 2018, 3:14pm

same question here, I was wondering the same

maybe it’s easier to play with the validation dataset - cause with from_csv you just need to give the idxs of the validation images, while with from_path you need to move files from the training folders in the validation folders. And if you want to make cross validation - this can be a tough job to move files between the folders.

gerardo · March 13, 2018, 5:46pm

@sermakarevich Now that the competition finished
Can you share some highlights of your solution or maybe the notebook?

sermakarevich · March 13, 2018, 6:06pm

I really don`t have much to share as I used only 1 notebook and it looks like a piece of … cake. All I change were image size and model. If you would like to look into messy notes, please send me your email.

alessa · March 13, 2018, 6:42pm

Could you explain what’s in the first three graphs? Is it the loss with respect to width, height?
How did you gather information to plot the graphs? Did you train the model with different sizes and save the results, and plot them at a later step?
For the second row graph, what exactly means less than, more than?
Thanks!

sermakarevich · March 13, 2018, 6:46pm

I am sorry @alessa, what graphs are you referring to? - updt: Found.

First chart is just plt.hist of images width, second one - of image height, third one - plt.scatter of width/height. To build the forth I use OOF predictions of a train set and calculate the score by selecting only images that satisfy criteria: < than some size or > than some size.

alessa · March 13, 2018, 6:47pm

these ones