Music Language Modelling

Hi all,

I’m trying to generate Music lyrics by creating a language model of 50 years of Billboard top songs dataset (by using the code from Lesson 4)

I’ve preprocessed the data to remove all columns and just keep the lyrics, shuffled the lyrics and stored them in a .txt file.

20% for Validation and 80% data for training. Moved the two .txt files to respective folders.

Here is my accuracy values from the learner.fit()

[ 0.       5.33582  5.20024]                                
[ 1.       4.88672  4.79015]                                
[ 2.       4.71025  4.67154]                                
[ 3.      4.5872  4.5754]                                   
[ 4.       4.37113  4.44869]                                
[ 5.       4.22731  4.38909]                                
[ 6.       4.17049  4.36377]                                
[ 7.       4.23515  4.41535]                                
[ 8.       4.09354  4.37585]                                
[ 9.       3.97444  4.34553]                                
[ 10.        3.87125   4.31838]                             
[ 11.        3.81844   4.29439]                             
[ 12.        3.73292   4.27785]                             
[ 13.        3.74693   4.25425]                             
[ 14.        3.74357   4.2605 ] 

However, the outputs generated are:

I've been reading books of old legends and myth.  

s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s gon na be a man who s ...

The words keep repeating themselves.

My repository:
https://github.com/init27/LyricGenerator
Can someone please help me find the fault in this model?
@Jeremy Is the problem because I’m using just two .txt files inside the test and train paths?

Regards,
Sanyam Bhutani.

3 Likes

Best guess…

Musics repeat themselves and so is the model doing doing the same…

PS I don’t listen to music …so very little knowledge

This isn’t going to help, but maybe your model is partial to The Proclaimers.:grinning:

3 Likes

hahaha ! Couldn’t help laugh at this.

1 Like

Haha. Yes, but I hope to make the chorus with a few more words atleast.

At first glance, it does not look like you are iterating when you are looping. My guess is you are feeding the same values into the model over and over while you loop instead of feeding in the previous output.

I think yes. But I’ve directly copied the lines from lesson 4. So that might not be the issue.
I’m not sure though…

Yeah don’t think that’s it now that I look at your code.

Because of an ASCII code issue, I had moved all the lyrics into a single .txt file and then converted the encoding to ASCII via a text editor. So I’m using a single file as the Path to Test and train set.

Could that have caused the issue?

Update:
I’ve stored the songs as separate .txt files.
Still the same results.

So I ran in to the same issue with the songs. I’ve seen this exact problem jeremy outlined a solution here Configuring stateful lstm cell in the the language model

This is a common problem with text generation networks which is compounded when songs have a lot of repeating chorus. One non machine learning way of solving this issue is picking the top 5 most probable word instead of the top one which keeps the song from repeating.

The other solution i’ve heard is using a pretrained embedding like https://github.com/facebookresearch/fastText

1 Like

I’m curious if there are a lot (more than prose) of unique/ different words…Perhaps remove a larger fraction of them?

I had seen a similar ‘looping’ of characters while trying out Karpathy’s Char-RNN model, he mentions it briefly. Some people have observed better ways to come out of loops by

  1. Not using the topmost choice for next word, but randomly sample from a top N list (increase N to make it more diverse)
  2. Using better regularization and dropout choices.
    I haven’t tried this lesson assignment fully so maybe you’re already doing these :slight_smile:

3 Likes

Funnily enough, the exact paper you all want has just been written:

7 Likes

Wouldn’t removing the unique words re-enforce the problem by increasing the probability of repetitive ones? Since most songs have tremendous repetitions (Particularly the old songs-which I realised after going through the data).

I’m trying the 2nd approach by tweaking the parameters, still no luck yet :sweat_smile:

@init_27 Hi Sanyam, I am interested in NLP as well. I found a blog post (with a paper) below. I think the " Eliminating Repetition with Coverage" session may help your problem conceptually. The interactive example is very cool. But, the system is not perfect yet.

http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html#easier-copying-with-pointer-generator-networks

3 Likes

Here a few ‘Lyrics’ generated from the model.
The starting word was ‘all’ and the rest are generated from the model. I’ve randomly picked a word for the next word in sequence from the top 3, as suggested by @lgvaz

all i ve seen a little i m not gon you go and be my you know i ve never seen the way to be a you know you got ta make you feel i got you like a i do you want me and i m not the same i do you want a girl and i do i know you want me you need me baby i love me i want you baby baby do me baby i love my do nt want you to do i need to do i want to…

Sanyam Bhutani

3 Likes

Nice! How many lyrics are you using to train?

I had used the US Top 100 songs of the past 50 years minus a few that I manually removed that were not in English.
So, about 4000 Songs (80-20 Train-Test split).

1 Like

Now, you need music to go with it.

1 Like

I’m playing around with Google magenta :yum: