How is Paddy Doctor Kaggle Competition Going?

fmussari · August 4, 2022, 2:04pm

I wanted to compile videos, notebooks and information about the Paddy Doctor Community Competition.

Please add any comments or what you think is missing.
And would be great if you share your experience and things you have tried.

How it started

The Kaggle community competition Paddy Doctor: Paddy Disease Classification was introduced to many of us through this forum post: Practice walk-thru 6 / chp1 on Kaggle!. There @radek put together a repository and instructions on how to submit to Kaggle from Paperspace and a list of questions and things to try on that dataset.

In the same post Jeremy replied: “Might be fun to look at this tomorrow…”

How it became part of the fastai way of learning

Next day in Live coding 7 session, Jeremy asked: “ok, Radek, you want to tell us about what this thing is?”. “This is the fastai way of learning”, he replied before start explaining his post.

From that point the competition became an increasingly important part of the course and live coding sessions. And (I think that) many of us learned the fastai way by trying that competition with our own hands.

The road to the top

In the live coding sessions Jeremy showed us how to explore and solve many things. From setting up and automating the creation of a work environment, to techniques as progressive resizing. Without being sure of what I was doing, we were looking inside a model, digging fast.ai code base, tutorials and documentation, creating custom loss functions… Jeremy shared with us fastkaggle for automating Kaggle stuff.

Most of what was covered was translated into these 5 notebooks:

Kaggle Notebooks shared by Jeremy during the Lessons and Live Code Sessions

Part 3 notebook got Jeremy to the top of the leaderboard, but not for too long.

Students passing to the top of the leaderboard

At some moment two students passed Jeremy in the leaderboard.
@kurian posted this topic Tips on how to improve your accuracy on Kaggle discussions.

Some days after another person who reached the top posted Thanks Fast AI!

How is Paddy Doctor Kaggle Competition Going?

This community competition ends in a month.

Competitions are great to learn data science but I suppose I’m not alone in the experience that in the first few competitions you are perhaps only learning to compete. To create the bases of code and understanding for experimenting a little more in the i(th) competition. Whichever your iteration is, you can start now with Jeremy’s Kaggle Notebooks. They are a synthesis of the video lessons and live coding sessions.

There are lots of fast.ai students in the leaderboard.

I hope there are going to be wonderful notebooks and discussions about techniques and approaches at the end of the competition.

References

Lessons with mentions about Paddy Competition

Lesson 6: Practical Deep Learning for Coders 2022 | Official topic | Recording
Lesson 7: Practical Deep Learning for Coders 2022 | Official topic | Recording

Live coding sessions with mentions about Paddy Competition

Live coding 7 | Topic | Recording
Live coding 8 | Topic | Recording
Live coding 9 | Topic | Recording
Live coding 10 | Topic | Recording
Live coding 11 | Topic | Recording
Live coding 12 | Topic | Recording
Live coding 13 | Topic | Recording
Live coding 14 | Topic | Recording
Live coding 15 | Topic | Recording
Live coding 16 | Topic | Recording

Intermediate/Advanced things to try (or to get familiar with)

I haven’t tried much of the techniques and tricks referenced here (yet). But trying to understand how some of them works makes you think about the internals of the model, the functions involved, the fitting process. I personally ended tweaking the Excel/Sheets files again simulating ~~“target encoding”~~ “label smoothing”.

Some of these resources come from @kurian topic Tips on how to improve your accuracy.

fmussari · August 13, 2022, 12:50am

Hi,

I am trying to apply MixUp to the Multi-Target implementation from Multi-target: Road to the Top, Part 4.

learn = vision_learner(dls, arch, metrics=metrics, loss_func=loss_func, cbs=MixUp(), n_out=n_out).to_fp16()

When using only disease as target it trains fine, but it seems MixUp does not work with the custom loss function.

AttributeError: Exception occured in `MixUp` when calling event `before_batch`: 'tuple' object has no attribute 'size'

Could anyone point me on how to do it?

Thanks

Mattr · September 1, 2022, 9:56am

Well it looks like the Paddy competition has closed and there has been quite a reshuffle in rankings…
Paddy Doctor: Paddy Disease Classification | Kaggle?

bencoman · September 1, 2022, 10:40am

Well the Paddy Disease Classification competition has finished. I was pleasantly suprised to jump from #102 on the public leaderboard to #74 on the final private rating, with a score of 0.98502. Excluding submissions below an arbitary cutoff of 0.7000 (~600 competent entries), that puts me at around 12.5%, which I’m really happy with. One of my submissions scored 0.98732 which would have got me to to #27, but it wasn’t auto-selected for final submission since its public score was lower. Thats fair since random chance plays its part in any particular alignment.

Here are forum members that could be identified:

Forum Id	Name	Rank	Score	Joined
Jeremy	Jeremy Howard	08	0.98732	Oct 2016
@Mattr	Matt Rosinski	09	0.98732	Feb 2020
@hsmann	Harpreet Singh	10	0.98732	Jan 2022
@piotr.czapla	Piotr Czapla	11	0.98732	Dec 2017
@n-e-w	Nick White	12	0.98732	Jun 2021
@Nival	Nival Kolambage	17	0.98732	June 2020
@icoup	Sebastian	24	0.98732	May 2022
@mike.moloch	Mike Moloch	30	0.98617	Oct 2017
@JackV	Jack Valencia	31	0.98617	Apr 2022
@alex.larrimore	Alex Larrimore	37	0.98617	Sep 2020
@carrch	Chris Carr	44	0.98617	Aug 2022
@kurianbenoy	Kurian Benoy	46	0.98617	Aug 2018
@Moody	Sarada Lee	62	0.98502	Jan 2017
@dpv	Dmitriy Popov	72	0.98502	Aug 2021
@bencoman	Ben Coman	74	0.98502	Apr 2022
@miwojc		99	0.98387	Apr 2018
@fmussari	Francisco Mussari	134	0.98271	Jan 2019

So thats 16 Fastai aficionados in the Top 100, of which 5 were newcomers to Fast.ai this year. Interesting to see there was only a 0.34% spread in the top 100.
Note, there were about 600 competent entries.

If I’ve missed anyone (especially anyone after the first 100 where I stopped transcribing), msg me to be added.

fmussari · September 5, 2022, 9:29pm

Well, it didn’t end well for me in the leaderboard.

Nevertheless, by giving a look at my submissions I found interesting insights:

Model 05. swinv2_base_patch4_window12_192_22k

This simple model performed the best between my submission. Even though its Public Score was not good at all.

This model was trained locally mostly with these lines of code:

dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files, get_y=parent_label,
    splitter=RandomSplitter(0.2, seed=70),
    item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=192, min_scale=0.75),
)
dls = dblock.dataloaders(train_path, bs=32) # 8 Gb GPU
arch = 'swinv2_base_window12_192_22k'
cbs = GradientAccumulation(64)
learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
learn.fine_tune(20, 0.005)

Model 06. swinv2_base_patch4_window12_192_22k

It’s courious that this model was trained with same parameters as the previous one, but using seed=42 instead of 70. Results are the other way around. An acceptable Public Score with a poor Private Score.

We can see here how much the score changes by only changing the seed.
Would be great to ensemble them and see its performance.

vit_large_patch16_224 | tta2_jeremy_part3

Another interesting submissions were ones taken from Jeremy’s Notebook Scaling Up: Road to the Top, Part 3. I explored these two Resize options for the vit model:

res = 640,480
models = {
     'vit_large_patch16_224': {
        (Resize(480, method='squish'), 224),
        (Resize(res), 224),
    }, 
}

tta2 is the one using (Resize(res), 224)

How about tta1 using (Resize(480, method='squish'), 224)

vit_large_patch16_224 | tta1_jeremy_part3

These models were trained with different seeds.
Jeremy mentioned and recommended that for ensembles.

Ensembling vit_large_patch16_224

An ensemble of models 2 models each with and without TTA resulted also best of my scores.

The ensembling was done by selecting the most frequent values by row, and setting some rules from the confusion matrix when there was a draw. That’s why I called it “manual tweaking”. Probably an ensemble taking the mean of probabilities would return a similar result, but I didn’t try it, should try that.

resnet34

Most of my submission were done trying different things starting with resnet34. At the end gains in Public Score didn’t translated to gains in Private Score. They were trained for lots of epochs. I was also trying to understand and apply some techniques that are visible in the descriptions below.

These are some notebooks I shared during the competition that achieve my final score of 0.98271.

Fast Resnet34 with Fastai

Stratified Cross-Validation, Resnet34 & Fastai

Conclusion

Of course the cases shown aren’t enough to draw conclusions, but I would say some general or known things that were confirmed :

Getting better in the Public Score doesn’t translate to a good final result for the Private.
Ensembles can capture models that perform poorly in Public set but are good in Private.
~~In this~~ For this particular case:
Medium and Big models required way lesser tweaks and epochs than resnet34 to achieve very good performances, they generalized much better.

Looking forward for the next competition! (with new ideas to try)

kurianbenoy · September 15, 2022, 7:18am

I just fully agree with points 1&2 of your conclusions. I noticed the same pattern in my experience also.

Thanks for writing this and I would encourage you to post this in kaggle discussion forums also.

Nival · September 15, 2022, 4:44pm

hey @bencoman, I was part of the live course and went through the code walkthrus. My submission was entirely from the stuff Jeremy showed us, and I ended up being the 17th in the private lb. Jumped 43 places

fmussari · September 17, 2022, 10:33am

Thanks Kurian!
Your Kaggle Discussion posts during the competition were very insightful.