I just thought I would provide a link to some projects I’ve been working on that make use of ULMFiT and the fastai library.
The first is YouToxic: 22.214.171.124 (github: https://github.com/jhochmuth/YouToxic)
It is a python web application that provides analytics for text toxicity. You can specify a twitter user or a youtube video URL and it will collect the tweets/youtube comments and make predictions for each text. You can also manually enter text or provide a csv/xls file.
The models were trained on the dataset from the Kaggle Toxic Comment Competition using the ULMFiT method. There are four different models - one for each type of toxicity that I chose to include.
I used the Dash - Plotly framework to build the application. It was then dockerized and deployed to a Google Kubernetes cluster.
The other project is Antidote: https://github.com/jhochmuth/Antidote
It takes some of the same ideas from the former project but applies them to a Chrome extension. When you visit the comments section of a youtube video, it will automatically hide toxic comments. The user can control the sensitivity of the extension (the sensitivity setting just controls the threshold at which a comment is considered toxic).
Even though Youtube already checks each comment for toxicity, I find that it is very generous. Everybody reading this probably knows from personal experience that there is plenty of toxic content that makes it through their filter. I think they are only trying to block comments that are “severely toxic” (a separate category from “toxic” in the dataset). My extension provides another advantage in that you can control the sensitivity yourself.
This project uses Tensorflow.js - it is a really cool technology because it allows for the extension to do all the computing on the client side so I don’t have to have a server ready to make predictions.
Unfortunately, you can only use keras and tensorflow models with tensorflow.js so I haven’t trained the model using the ULMFiT method yet (I just used a basic LSTM keras model I found on kaggle and switched out the datasets). But I plan on adding a model trained with ULMFiT soon.
I haven’t published it so far because I have yet to give Google $5, but you can use it yourself by cloning the repo.
If you guys have any ideas, advice, or criticism for me, I would greatly appreciate it.