Training a language model

msp · May 7, 2018, 8:03am

Has anyone been able to train a decent language model with the Lesson 4 notebook?

I get to a validation loss of about 4.19, but the generated sentences are really awful. I’ve been tweaking the learning rate and wds, but the quality hasn’t improved, so I was wondering whether someone had found a good training schedule.

Vishucyrus · May 7, 2018, 8:29am

One of the students … Charin has done it for Thai language… here is the link…

For better results you can further improve the model by using cache pointers… As discussed in this article here…

github.com

sgugger/Deep-Learning/blob/master/Cache pointer.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook goes with [this blog post](https://sgugger.github.io/pointer-cache-for-language-model.html#pointer-cache-for-language-model) that explains what the continuous cache pointer is. This technique was introduce by Grave et al. in [this article](https://arxiv.org/pdf/1612.04426.pdf)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },

This file has been truncated. show original

While you can build a good language model… it isn’t necessary that you get decent prediction for every sequence of words. It could depend upon the corpus you used. Remember it’s just a basic language model. But they could be coupled with GANs for building better chatbots or just with FC to build a world class classifier (like Jeremy)…

msp · May 7, 2018, 8:48am

Thanks @Vishucyrus. Yes agreed, depending on your goal it isn’t necessary to get good predictions. But I was still surprised that it did not perform a bit better than this, especially compared to the output from the char-rnn, which is quite amazing.

Vishucyrus · May 7, 2018, 9:39am

Oh… in that case I strongly recommend you to incorporate the cache pointers technique into your model. I guess that should get you the desired results…

msp · May 7, 2018, 12:42pm

Thanks, the cache pointers paper does look interesting, I had never heard of it! (it’s kind of a poor name for an ML method, it sounds more like a C programming technique)

Vishucyrus · May 7, 2018, 1:44pm

Hehehe… … Ur right… I think u should invent a new name for it…