Quora insincere questions challenge-high accuracy but not working

nikhil_no_1 · December 25, 2018, 7:35am

Hi, using lesson3-IMDB notebook as a reference, I am solving the Quora insincere questions challenge on Kaggle. Right now I just want to solve the problem without really worrying about the rules.
As per my understanding, I have done everything right and do get a high accuracy ~96%. However when I use it to do a single prediction from existing training data, it is failing miserably.
For e.g.:

In: learn.predict("Why are men selective?")
Out: (Category 0, tensor(0), tensor([0.6251, 0.3749]))

The complete notebook can be seen here
I haven’t done any feature engineering yet.

What could be the issue here? Pls share your thoughts/ideas.
As I wrote this perhaps one thing I need to check is the presence of insincere questions in the validation set.

Also, can we use the ClassificationInterpretation for this (text) problem? It works but docs say it should only be used for vision.

nikhil_no_1 · December 25, 2018, 8:16am

Forgot to mention one thing, I also tried using Fbeta_binary metric but surprisingly the predictions for the same question became worse.

tinhb · December 26, 2018, 7:55am

I’m also working on this. Do you know what is the default loss function for text_classifier_learner?

nikhil_no_1 · December 26, 2018, 8:46am

Accuracy. For this competition due to imbalance dataset, you will get 94% accuracy even if you predict everything as sincere.

tank13 · December 26, 2018, 2:46pm

One thing I notice is that the language model isn’t working well – the generated “questions” appear to be random words. Unfortunately I’m having the same problem (for a different task and when I run the imdb notebook) and don’t know how to fix it. (My own post on the problem: https://forums.fast.ai/t/troubleshooting-word-salad-output-of-text-generator/33245)

@tinhb are you getting sensible predictions from the language model?

tinhb · December 26, 2018, 6:35pm

Currently I havent use the language model yet, only use TextClasDataBunch and plug in my model.

nikhil_no_1 · December 27, 2018, 3:59am

Yes, i also observed the same but didn’t pay much attention. But now that you mentioned it, it definitely is an issue.

tank13 · December 27, 2018, 4:07am

I dealt with this by using fastai 1.0.37 – obviously not an ideal solution, but workable as a stopgap.

tinhb · January 1, 2019, 3:41am

With the constraint on compute, the usual workflow of language model then text classifier might not work here. If we want to use language modeling then it will need to be high learning rate and short, imo.

chans.best · January 2, 2019, 6:34am

i don’t think you need to check presence of sincere samples in validation set , split step in databunch should handle it.
Most probably we are overfitting our dataset hence always predicting 1.Maybe we can increase the dropout to reduce overfiting.

Btw i was trying the same but somehow kept getting this issue and never managed to fit,

GiantSquid · January 24, 2019, 7:40pm

I’ve been working on this competition too. For me the bigger problem is that it’s not necessarily possible to run the pipeline from the IMDB notebook with the time and compute allotted in the kernel. At best it’s very slow. Additionally, the competition rules specify no outside data, so it’s actually against the rules to use the wikitext-103 pretrained model. There are some whitelisted pretrained models, but I don’t know how easily they plug into our model - haven’t gotten to that yet. I’m not sure if there are workarounds to these problems or if the rules for this competition just make using fastai impractical. I ran a kernel with 5% of the data and no pretrained model, just to see if it would work at all, and I was able to get through it, but with predictably terrible results. If someone has a fastai-based kernel that’s working well I would love to see it.

fabsta · January 30, 2019, 2:01pm

Hey @martijnd!

I am also trying to use fastai here.
I was looking at your first setup, but it seems empty. Is that right?

fabsta · January 30, 2019, 2:20pm

Hey @nikhil_no_1
thanks for sharing your notebook. I just started looking into this competition, so it’s helpful, but I can’t provide help yet.
I was wondering if you have used the embeddings that come with the Kaggle challenge?
Thanks!

nikhil_no_1 · January 30, 2019, 3:34pm

Not yet. I wanted to make it work with fastai first.
I got side-tracked with other things so wasn’t able to spend any time for a month.
Hope to get back to it in a few days.

Mirodil · February 4, 2019, 3:05pm

Hello,

I have implement this competition using Fast.AI and also loaded provided pre-train weights. here is notebook:

https://www.kaggle.com/mirodil/quora-insincere-qc-with-fastai?scriptVersionId=10156558

but an f1 score isn’t so good.

mnpinto · February 4, 2019, 4:34pm

I’ve also tried to use fastai on this competition but the time and no external data constraints don’t allow to get very far. Nevertheless I reached a public LB score of 0.607 with fastai without the provided word embeddings, under the kernel time constraints. I can share the notebook if anyone is interested.

Mirodil · February 4, 2019, 6:41pm

I would love to see kernel.

mnpinto · February 5, 2019, 10:43pm

Here is the kernel with LB 0.607 score: https://www.kaggle.com/mnpinto/quora-fastai-v1-0-baseline

I was trying to clean the code and add some comments but it turns out fastai version on kaggle updated meanwhile. I think the version at the time it was working was 1.0.36.post1.

Basically the solution consists of:

Training a language model (not pretrained due to competition constraints). I trained on all the training data (leaving 10% for validation) for only one epoch (due to time constraint).
Then for the classification task I loaded the encoder from the language model as usual, trained for one epoch, unfreezed and trained for 2 more epochs. This step was done with only 30% of the train data (leaving 10% for validation, again due to time constraint).
Finally, find the best threshold based on validation set and create submission csv.

All this needs to run in 2h at most.

There is clearly lot of room for improvement if we remove the constraints of time and external data!

jls · February 22, 2019, 7:47am

I think the loss function is cross_entropy. @nikhil_no_1

jls · February 22, 2019, 9:23am

I just want to experiment Jeremy’s method, so I didn’t obey the rules. I trained the language model with pretrained on and got pretty good result.


Total time: 39:17
epoch	train_loss	valid_loss	accuracy
1	3.580200	3.459874	0.397804
2	3.384912	3.311979	0.412261

learn.predict('How to learn Chinese', n_words=30, temperature=0.75)

output: ‘How to learn Chinese from China ? xxbos How do you feel about your child having a friend who talks to you ? What can you tell him about him ?’

@nikhil_no_1