Quora insincere questions challenge-high accuracy but not working

Hi, using lesson3-IMDB notebook as a reference, I am solving the Quora insincere questions challenge on Kaggle. Right now I just want to solve the problem without really worrying about the rules.
As per my understanding, I have done everything right and do get a high accuracy ~96%. However when I use it to do a single prediction from existing training data, it is failing miserably.
For e.g.:

In: learn.predict("Why are men selective?")
Out: (Category 0, tensor(0), tensor([0.6251, 0.3749]))

The complete notebook can be seen here
I haven’t done any feature engineering yet.

What could be the issue here? Pls share your thoughts/ideas.
As I wrote this perhaps one thing I need to check is the presence of insincere questions in the validation set.

Also, can we use the ClassificationInterpretation for this (text) problem? It works but docs say it should only be used for vision.


Forgot to mention one thing, I also tried using Fbeta_binary metric but surprisingly the predictions for the same question became worse. :frowning:

I’m also working on this. Do you know what is the default loss function for text_classifier_learner?

Accuracy. For this competition due to imbalance dataset, you will get 94% accuracy even if you predict everything as sincere.

1 Like

One thing I notice is that the language model isn’t working well – the generated “questions” appear to be random words. Unfortunately I’m having the same problem (for a different task and when I run the imdb notebook) and don’t know how to fix it. (My own post on the problem: https://forums.fast.ai/t/troubleshooting-word-salad-output-of-text-generator/33245)

@tinhb are you getting sensible predictions from the language model?

Currently I havent use the language model yet, only use TextClasDataBunch and plug in my model.

Yes, i also observed the same but didn’t pay much attention. But now that you mentioned it, it definitely is an issue.

I dealt with this by using fastai 1.0.37 – obviously not an ideal solution, but workable as a stopgap.

With the constraint on compute, the usual workflow of language model then text classifier might not work here. If we want to use language modeling then it will need to be high learning rate and short, imo.

1 Like

i don’t think you need to check presence of sincere samples in validation set , split step in databunch should handle it.
Most probably we are overfitting our dataset hence always predicting 1.Maybe we can increase the dropout to reduce overfiting.

Btw i was trying the same but somehow kept getting this issue and never managed to fit,

I’ve been working on this competition too. For me the bigger problem is that it’s not necessarily possible to run the pipeline from the IMDB notebook with the time and compute allotted in the kernel. At best it’s very slow. Additionally, the competition rules specify no outside data, so it’s actually against the rules to use the wikitext-103 pretrained model. There are some whitelisted pretrained models, but I don’t know how easily they plug into our model - haven’t gotten to that yet. I’m not sure if there are workarounds to these problems or if the rules for this competition just make using fastai impractical. I ran a kernel with 5% of the data and no pretrained model, just to see if it would work at all, and I was able to get through it, but with predictably terrible results. If someone has a fastai-based kernel that’s working well I would love to see it.

I tried to implement the IMDB notebook, but the restrictions of this competition makes it really hard. You don’t have the time to train a language model and you are not allowed to download the pretrained Wiki.

So we need to find another way and make use of the embeddings which are given. :slight_smile:

I am trying to make use of one of the kernels (https://www.kaggle.com/hung96ad/pytorch-starter) and see if I can convert this model and use the embeddings in the FastAI framework.

In lesson 7 Jeremy gives a short introduction in writing your custom models, but this is above my head… Maybe it is not even possible to implement such a custom model.

Here is my first setup.

Hey @martijnd!

I am also trying to use fastai here.
I was looking at your first setup, but it seems empty. Is that right?

Hey @nikhil_no_1
thanks for sharing your notebook. I just started looking into this competition, so it’s helpful, but I can’t provide help yet.
I was wondering if you have used the embeddings that come with the Kaggle challenge?

Not yet. I wanted to make it work with fastai first.
I got side-tracked with other things so wasn’t able to spend any time for a month. :frowning:
Hope to get back to it in a few days.


Yes. It is not working. I got stuck with the implementation of the custom model as presented in the other notebook. If anyone has an idea how to proceed from this point, I will be grateful :slight_smile:

class NeuralNet(nn.Module):

Probably Jeremy will explain further how to work with custom models in the second part.


I have implement this competition using Fast.AI and also loaded provided pre-train weights. here is notebook:


but an f1 score isn’t so good.

1 Like

I’ve also tried to use fastai on this competition but the time and no external data constraints don’t allow to get very far. Nevertheless I reached a public LB score of 0.607 with fastai without the provided word embeddings, under the kernel time constraints. I can share the notebook if anyone is interested.

I would love to see kernel.

Here is the kernel with LB 0.607 score: https://www.kaggle.com/mnpinto/quora-fastai-v1-0-baseline

I was trying to clean the code and add some comments but it turns out fastai version on kaggle updated meanwhile. I think the version at the time it was working was 1.0.36.post1.

Basically the solution consists of:

  • Training a language model (not pretrained due to competition constraints). I trained on all the training data (leaving 10% for validation) for only one epoch (due to time constraint).
  • Then for the classification task I loaded the encoder from the language model as usual, trained for one epoch, unfreezed and trained for 2 more epochs. This step was done with only 30% of the train data (leaving 10% for validation, again due to time constraint).
  • Finally, find the best threshold based on validation set and create submission csv.

All this needs to run in 2h at most.

There is clearly lot of room for improvement if we remove the constraints of time and external data!

1 Like