I am trying to classify whether a sentence is question or not.
statement : do you like food?
predicted_question : 1
statement : The boy who sat beside him was his son.
predicted_question : 0
How can i approach this problem ? And how can i prepare a data set for it ?
Check if there is a question mark in the string
On a more serious note, I think the classifier would be challenged by labeled sentences where the question marks are removed.
Didn’t get to NLP myself yet as you can see
The first step is to create the dataset, you can create dataframe with a column for the questions/not questions and another with the labels. Then you can use the datablock api to create a databunch. You can follow lesson 3 notebook, the problem is very similar, instead of positive/negative you have question/statement.
You can try to remove the question marks from the questions to see how well the model can do.
I hope this helps