I am wondering if we will advance to a stage where deep learning will eventually enable machines to compose new music, or a totally new painting or write a new script for a movie or compose lyrics. Like how computers do with there imagination?
Do we think this is possible?
I do not know much about imagination in general nor in the context of computers and would not be able to speak to that
But if you are talking about creating music or art, one of the students does amazing things with GANs and her artwork There has been musical pieces ‘composed’ by algorithms played by orchestras, you can look them up if you’d like.
This course covers the techniques that can be used to such ends with great success
Much of the best work is happening here: https://magenta.tensorflow.org/
As for music, I saw some guy on Youtube create an algorithm that trains itself on MIDI files and composes new music based on tones and rhythms similar to what it’s trained on. MIDIs are very primitive - I’m wondering if the same methodology can extend to, say, WAVs or MP3s.
On the script/storytelling front there’s a very interesting field called ‘Natural Language Generation’ that I’ve just started touching upon. The idea is to generate text from either text, images or structured data. It seems like deep learning is only just starting to be applied there and the results are quite interesting. It’s been very successfully applied for image captioning, but the field goes way beyond that.
I actually wanted to raise this topic area to @jeremy as I think it’s a very interesting one and would love to see it covered in part 2 of the course.
Here are a few papers/articles on the subject:
https://arxiv.org/pdf/1703.09902.pdf (A recent survey; 118 pages)
https://www.kdnuggets.com/2017/05/nlg-natural-language-generation-overview.html (An article focused on corporate solutions)
https://arxiv.org/abs/1707.02633 (Controlling Linguistic Style Aspects in Neural Language Generation)
http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP199.pdf (Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems)
At it’s core the idea is a really powerful one, especially when transforming structured data to text. In some ways it’s like a language model that you can influence/control the output of.
Hey Even - that is a nice survey there! Seems it is just a day old? Well, can’t look at it ATM but saving for later.
Here are some artists/researchers using deep learning to recreate the sounds of musicians such as the Beatles and Battles: Dadabots
They link to the relevant research as well.
Wondering if fast.ai framework can provide better results than magenta?
[edit:] I’ve just noticed that we will cover this in part 2, the generative models - applied to images and sentences. So I guess it will be easy to transpose for music as well.
This weekend, I am participating to a music generation hackaton, and was trying to figure out how could I apply fast.ai and if it’s appropriate for it.
I dont have links to papers but
And for music, I think probably would be one of the first things that showed up to me in this regard
It would be extremely interesting these days to startup a software package (with deep learning) allowing people to produce a complete, high-budget film without any budget. In fact, cinema is the least reproducible among the arts these days because a technically good film requires a lot of money and people.
I participated to my first music hackaton - and I used lesson 4 to generate some nice music files.
Here it is the repository and the results https://github.com/alessaww/fastai_ws/tree/master/musichack
I will write a blog about it some day.
today, write a blog about it today (or this week)…
Siraj also has a video on deep learning and music. And even interviewed
Taryn Southern, who creates music using AI, explaining what kind of software she uses
Example of Taryn’s music:
Definitely interesting work! How did you annotate the key signature in each tune you process? I see the sample file being processed in G major, which means F is F#, but I see the following in the code
key: "K:maj" which is a bit confusing. Does the model learn sharps and flats from the structure of the tune, or it learns the key names separately from the notes?
So, Unfortunatelly, I know nothing about music. I get inspired from
[this work](https://github.com/IraKorshunova/folk-rnn). You can find some papers where they described how they cleaned the data, what tokens they used.
What I’ve done is not so much, I’ve took the data provided by them and train some models with pytorch.
The data comes indeed in notes, represented by letters, but I don’t know how to interpret that - so the maximum checking that I can do is to see if the result sounds good or bad.
I’ve seen as well, that the model always starts with K:maj, no matter how much I train, and no matter what the training data contains.
And another thing that is kind of confusing - if you try to give as input the first part of a song, but let the model to predict the next sequences it will fail, by repeating the same note over and over again.
So it’s more research to be done. But I encourage you to play with it, since it’s super easy to train it.
Thanks! Yes ABC is a common notation format for folk music. Each letter represents a musical note so C is ‘do’ and D is ‘re’, so on so forth. Lower case means a higher octave. The wikipedia article isn’t that good with explaining how it works and I found this one better.
I think some decisions made on cleaning is worth digging deeper for sure. This will be a side project for part2 then I happen to play Irish music so it would be particularly interesting to see how ‘good’ the model is by how it sounds, in addition to the common metrics.
Thanks for sharing the article! Didn’t know much about this field but sounds fascinating for sure.