Music Language Modelling

init_27 · November 29, 2017, 4:58pm

Hi all,

I’m trying to generate Music lyrics by creating a language model of 50 years of Billboard top songs dataset (by using the code from Lesson 4)

I’ve preprocessed the data to remove all columns and just keep the lyrics, shuffled the lyrics and stored them in a .txt file.

20% for Validation and 80% data for training. Moved the two .txt files to respective folders.

Here is my accuracy values from the learner.fit()

[ 0.       5.33582  5.20024]                                
[ 1.       4.88672  4.79015]                                
[ 2.       4.71025  4.67154]                                
[ 3.      4.5872  4.5754]                                   
[ 4.       4.37113  4.44869]                                
[ 5.       4.22731  4.38909]                                
[ 6.       4.17049  4.36377]                                
[ 7.       4.23515  4.41535]                                
[ 8.       4.09354  4.37585]                                
[ 9.       3.97444  4.34553]                                
[ 10.        3.87125   4.31838]                             
[ 11.        3.81844   4.29439]                             
[ 12.        3.73292   4.27785]                             
[ 13.        3.74693   4.25425]                             
[ 14.        3.74357   4.2605 ]

However, the outputs generated are:

I've been reading books of old legends and myth.  

s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s ...

The words keep repeating themselves.

My repository:
https://github.com/init27/LyricGenerator
Can someone please help me find the fault in this model?
@Jeremy Is the problem because I’m using just two .txt files inside the test and train paths?

Regards,
Sanyam Bhutani.

ecdrid · November 29, 2017, 5:08pm

Best guess…

Musics repeat themselves and so is the model doing doing the same…

PS I don’t listen to music …so very little knowledge

dgovender · November 29, 2017, 5:13pm

This isn’t going to help, but maybe your model is partial to The Proclaimers.

asawant · November 29, 2017, 5:15pm

hahaha ! Couldn’t help laugh at this.

init_27 · November 29, 2017, 5:22pm

Haha. Yes, but I hope to make the chorus with a few more words atleast.

metachi · November 29, 2017, 5:32pm

At first glance, it does not look like you are iterating when you are looping. My guess is you are feeding the same values into the model over and over while you loop instead of feeding in the previous output.

init_27 · November 29, 2017, 5:35pm

I think yes. But I’ve directly copied the lines from lesson 4. So that might not be the issue.
I’m not sure though…

metachi · November 29, 2017, 5:41pm

Yeah don’t think that’s it now that I look at your code.

init_27 · November 29, 2017, 5:43pm

Because of an ASCII code issue, I had moved all the lyrics into a single .txt file and then converted the encoding to ASCII via a text editor. So I’m using a single file as the Path to Test and train set.

Could that have caused the issue?

init_27 · November 30, 2017, 6:46am

Update:
I’ve stored the songs as separate .txt files.
Still the same results.

cheeseblubber · November 30, 2017, 5:11pm

So I ran in to the same issue with the songs. I’ve seen this exact problem jeremy outlined a solution here Configuring stateful lstm cell in the the language model

This is a common problem with text generation networks which is compounded when songs have a lot of repeating chorus. One non machine learning way of solving this issue is picking the top 5 most probable word instead of the top one which keeps the song from repeating.

The other solution i’ve heard is using a pretrained embedding like https://github.com/facebookresearch/fastText

beecoder · November 30, 2017, 5:33pm

I’m curious if there are a lot (more than prose) of unique/ different words…Perhaps remove a larger fraction of them?

I had seen a similar ‘looping’ of characters while trying out Karpathy’s Char-RNN model, he mentions it briefly. Some people have observed better ways to come out of loops by

Not using the topmost choice for next word, but randomly sample from a top N list (increase N to make it more diverse)
Using better regularization and dropout choices.
I haven’t tried this lesson assignment fully so maybe you’re already doing these

jeremy · December 1, 2017, 5:37am

Funnily enough, the exact paper you all want has just been written:

Neural Text Generation: A Practical Guide

init_27 · December 1, 2017, 10:39am

Wouldn’t removing the unique words re-enforce the problem by increasing the probability of repetitive ones? Since most songs have tremendous repetitions (Particularly the old songs-which I realised after going through the data).

I’m trying the 2nd approach by tweaking the parameters, still no luck yet

Moody · December 3, 2017, 7:22am

@init_27 Hi Sanyam, I am interested in NLP as well. I found a blog post (with a paper) below. I think the " Eliminating Repetition with Coverage" session may help your problem conceptually. The interactive example is very cool. But, the system is not perfect yet.

http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html#easier-copying-with-pointer-generator-networks

init_27 · December 19, 2017, 10:08am

Here a few ‘Lyrics’ generated from the model.
The starting word was ‘all’ and the rest are generated from the model. I’ve randomly picked a word for the next word in sequence from the top 3, as suggested by @lgvaz

all i ve seen a little i m not gon you go and be my you know i ve never seen the way to be a you know you got ta make you feel i got you like a i do you want me and i m not the same i do you want a girl and i do i know you want me you need me baby i love me i want you baby baby do me baby i love my do nt want you to do i need to do i want to…

Sanyam Bhutani

thiago · December 19, 2017, 10:25am

Nice! How many lyrics are you using to train?

init_27 · December 19, 2017, 10:32am

I had used the US Top 100 songs of the past 50 years minus a few that I manually removed that were not in English.
So, about 4000 Songs (80-20 Train-Test split).

Moody · December 21, 2017, 11:18am

Now, you need music to go with it.

init_27 · December 21, 2017, 1:48pm

I’m playing around with Google magenta