RNN Name Generator (learning project)

jkh · May 30, 2018, 8:46am

Hi all !

As I am trying to undestand RNN and LSTM I started a small project to create a model that will learn to generate names or first names based on a given corpus (tested on Lord of the Ring character list and real first names):

Being quite happy with the results I would like to share it here to get some reviews. Every comment is welcome (coding style, clarity, project structure, ML approach, etc.) !

The project is a Jupyter Notebook based on Tensorflow/Keras.

As a wrap up:

I feel that the task to learn is quite easy (compared to text generation based on books like here.
The main difficulty was to generate short fix-sized input sequences. I add some padding characters according to the choosen sequence size to do that easily and to generate names without seeds. But I’m not sure it is the right way to do it.
Very small RNN (single layer, down to 16 hiddent units) are ables to do quite well.
LSTM cells are probably overkill for that.
But it allows me to keep a ‘large’ character dictionnary (didn’t convert upper case or weird diacritic) and generate fun LOTR names.

What do you think of this project and what could have been done differently ?

msp · May 30, 2018, 10:56am

I think this is a really cool project!

What I would have done differently: show more results! I want to see tons of generated names of elves, dwarves, hobbits!

Another point is that I don’t follow why you do the padding. Check for instance Andrey Karpathy’s seminal blog post where no padding of any kind is needed.

jkh · May 30, 2018, 2:44pm

Thanks for your kind word !

You’re right it miss some fun results:
Hobbits (64 hidden units, 400 epochs)
Isembran Took, Gordbea Grund, Mayrangl, Myrtle BurarwBlolc (wrong upper case )

Elfs (256 hidden units, 350 epochs)
Aedneí, Thanaëa, Enros, Gilignor, Arnedei, Elwn, Calanor, Amros, Aradwen, Fanath, …

Mens (64 hidden units, 30 epochs)
Agriel, Tar-Meneldil, Tarduin, Cemendi, Tarciryas, Elptar, Hirost, …

Real first names (64 hidden units, 100 epochs)
Irvia, Amadrie, Anjell, Kathenzita, Alberte, Kamely, Triny, Iacki, Sasham, Molette, Timory, Herrika, Tarletta, Mikas, …

I will check on Andrey work because I didn’t read it extensively before and I just saw he did it for baby names too. Thx !

msp · May 31, 2018, 9:02am

Those look pretty good!

What do you think about a “blind test”: You give us ~10 names from each category, all mixed up, and we have to guess the category for each name (Hobbit/Elf/Man/Real)?