I’m very interested in using fastai/NLP to identify fake news. I’ve been looking for a good dataset to experiment with, but haven’t yet seen anything that’s very comprehensive and current. I wonder though if it really needs to be very current (recent)?
But before I spend much time on this I wanted to check with the fastai community. Any suggestions? Does anyone know of a good dataset with both positive and negative examples?
Kaggle has a fake news competition live in the ‘in class’ section with a 400k size dataset. Of course, one must abide by data usage rules, and it appears these are news headlines rather than content. https://www.kaggle.com/c/fake-news-pair-classification-challenge
@bachir Very interesting! See this thread for problems I ran into when I initially tried using
TextList.from_csv:
If you read down to the latest posts from today, you can see that it appears that some of the problems I had were caused by running on Gradient. What platform did you run on?
@ricknta I was running into similar problem with the block api, what I usually do to locate the problem is separate the chain of commands into separate lines, like:
I’ve been using colab from the beginning of the course, it’s not a serious solution but OK for most use cases. However, I realized that surprisingly NLP is more greedy for GPU than image stuff and as result colab kernel all the time at the edge of crashing!