Using ULMFit Transfer Learning for text classification for multiple classes

vladgets · November 1, 2019, 10:19pm

In the course examples of Sentiment classification, it was always classification for 2 classes (Positive and Negative ones), which usually brought quite good results of 90 percent accuracy or higher.
I wonder how much the accuracy decrease if we need to classify texts for multiple classes, like 5 or more and not only for 2 classes?
Are there such examples using ULMFit Transfer learning?

Another question: What is the estimation for an order of magnitude for minimal number of examples to be able to train our system for classification? Movie Sentiments are using 50K examples (together for training and testing), can we succeed with much less?

muellerzr · November 1, 2019, 10:41pm

I did a news bias detector that involved 11 classes. I got about 93% accuracy. It depends on the dataset really. And what you want to do with it.

I want to say my dataset was about 5k rows due to the amount I could load on the GPU/CPU at the time

vladgets · November 2, 2019, 11:44pm

Thanks for the reply.
Do you mean that you succeed to classify 11 different type of news with 93% accuracy?
I think it is great achievement, a way beyond of the current state of the art.

What do you mean by 5K rows? Did you have 5000 examples of annotated news articles? For 11 classes together?

muellerzr · November 2, 2019, 11:56pm

Thank you! Yes, it was a toy problem I made in 48 hours. Plenty of improvement could be made. I had the raw text of 5,000 news articles each with one label.

avatar · November 3, 2019, 1:53am

Check this

github.com

pradeepthiyyagura/DDSUG_Oct19/blob/master/Demo of 20 News group classification at DDSUG October 2019.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# News Group Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following notebook is based on code from a [fast.ai MOOC Lesson3](https://course.fast.ai/videos/?lesson=3). \n",
    "It was presented at the [2019 October Desert Data Science User Group](https://www.meetup.com/Desert-Data-Science-User-Group/events/265691365/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,

This file has been truncated. show original

muellerzr · November 3, 2019, 2:06am

Interesting! A different dataset than i wound up using, but cool!