THE RAiMONES

making punk rock intelligent, artificially intelligent.

Punk Rock Now!

Project Description

I think we can all agree that more great musicians are dead than alive. Modern technology cannot resurrect Ian Curtis, Kurt Cobain and Lemmy, but we believe that through AI more songs inspired by these great (but dead!) musicians can be written.


Hey Ho, Let's Go!


In this project we used 130 songs in MIDI format (60 unique songs and 70 variations thereof) of the most amazing punk rock band ever, The Ramones, as well as the lyrics of all their 178 songs. A deep learning neural network was then trained with all that data; not big data, but still. Training the neural deep learning network for about 45 minutes on a GPU instance of AWS, the network learned to write guitar and bass lines for Ramones-inspired music.

Regular deep-Learning neural networks are great for static data, for example for classifying photos, cats vs. dogs. Music and text have a temporal component and we therefore need stateful networks, ie. deep learning networks with memory. For this project--lyrics & music generation--we chose Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs). More information on these networks can be found on the blog of Andrej Karpathy.

Please also check out the slides of the final presentation of this project on Slideshare or YouTube, and the code on github.

Music from MIDI files

First we downloaded all the MIDI formatted songs that we could find on the internet and transposed all of them to C-Major. The recommended software for dealing with MIDI files would be MuseScore, but also Apple's GarageBand can open the files easily. Please be aware that MIDI is an old and ugly format, dealing with it is a bit of a nightmare.

For the whole project we were using Python versions 2.7 and 3.6. The library to manipulate the MIDI input data of our choice was music21. The library is powerful but also buggy and we needed several work-arounds to be able to extract all the tracks we wanted.

As not enough "voice-tracks" could be found, we decided to focus solely on the guitar and bass tracks which we extracted from all the transposed songs and serially concatenated them. After having inspected the music, the songs were quantized in sixteenths (1/16) notes, i.e. the base note became a "1/16-th note". Next to the 6 parallel tracks of the guitar (6 strings) and the one track from bass we added 2 bits for "mute" and 2 bits for "hold". With the "hold" bit, notes that are longer than 1/16-th can be encoded; this is done by multiplying the notes and setting the "hold-bit" to "1".




The numbers encoding the pitch (0..127) of each note were directly taken from the MIDI files. MIDI Notes encoding

By chaining all 130 songs we ended up with about 150'000 lines of 1/16-th notes of which about 2'200 were unique combinations of guitar, bass and mute/ hold bits. These ~2'200 unique combinations were hot-encoded ("bag of words") and used to train a LSTM neural network. The chosen input length was 32 and 64 respectively, ie. 2 or 4 bars were used as inputs.

The training data (X) was generated by splitting the notes into blocks of either 32 or 64 1/16-th quantized chords (guitar & bass) that had been hot-encoded; the expected output (y) corresponding to the input data (X) used for training the network was the following chord. The output vector (y) was hot-encoded as well.

The networks used were LSTM-RNN's programmed using "Keras on "Tensorflow; sizes varied but the following is one configuration that provided good results:

The code written in Python with Keras:
maxlen = 32 #max. seq.-length: 32 or 64
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))


Figure: The 10 most frequently played Ramones chords (transposed to C-Major)

Implementations and Losses

The following LSTM-RNNs were implemented and trained resulting the described losses:

Implementations and Losses
Implementation 1:

maxlen = 64
step = 3
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))
Implementation 2:

maxlen = 32
step = 3
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))
Implementation 3:

maxlen = 32
step = 1
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))

min. loss: 0.8142
Total params: 1'700'136
min. loss: 0.9395
Total params: 1'700'136
min. loss: 0.9009
Total params: 1'700'136


Implementation 4:

maxlen = 64
step = 1
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))

Implementation 5:

maxlen = 32
step = 1
model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(LSTM(64, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))

Implementation 6:

maxlen = 32
step = 1
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(words))))
model.add(Dropout(0.2))
model.add(Dense(len(words)))
model.add(Activation('softmax'))

min. loss: 0.9119
Total params: 1'700'136
min. loss: 1.1968
Total params: 802'088
min. loss: 0.8712
Total params: 1'568'552

For all the implementations the batch size was kept constant at 128. The relations between the minimum loss and the structure of the LSTM-RNN, as well as the trainings-data was not apparent. It can however be seen that the loss is larger when there are less parameters to train (cf., Implementation 5). All RNNs were trained until the loss seemed to have reached (or was stuck stuck in) a (local) minimum. The training data of about 150'000 chord sequences is probably too small for reliable results and more meaningful interpretations.

Trainings Data

After about 50 to 120 epochs of training (depending on the network), the error-rate was not decreasing anymore; this took around 20-50 minutes on a NVIDIA K80 TESLA GPU. The training data could be reduced by increasing the step size - this had a big (linear) impact on the training times, but did not affect the final error much.

No validation data was put aside for verification/ testing.

Not only the weights of the fully-trained networks were stored, but also intermediate results. In some cases, these produced better results; this effect was however not further investigated.

Lyrics Generation

We scraped websites to download all lyrics of the 178 Ramones songs; this was a rather straight-forward task using the Python library BeautifulSoup. First statistics regarding the lyrics were then computed from the text:

Statistics from the lyrics of the 178 songs:

6’361 lines, 29’976 words, 154’207 characters

Most frequent word: “I” - 1351 occurrences
Most frequent noun: “baby” - 272 occurrences
Most frequent verb: “go” - 235 occurrences
Other notable words: “yeah” (196), “love” (168), “punk” (98)

The text was split up into its characters and various LSTM-RNNs were then trained and tested: 1 or 2 layered and of various sizes. The parameter that had the biggest impact on the output-quality of the text was not the size or the topology of the LSTM-RNN but the length of the input character-string. After experimenting with input-lengths of 10, 40 and 100 characters, an input length of 40 characters clearly produced the best results. This is most probably an effect of the limited training data set.
It is impressive that a LSTM-RNN can "learn to write" within 45 minutes of training with a text of about 30'000 words (written by The Ramones)! The first output the LSTM-RNN generated was as follows, further examples can be found here.

all of a sudden i feel son gone a gas i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, i'm alive, yeah, oh yeah


(I immediately switched off the computer and erased the program.)

LSTM-RNNs vs. Markov Chains

LSTM-RNNs produce impressive results but they also require vast amounts of data and computational power. We should not forget that also Markov Chains can "learn" from training-text and then write "original" texts. An interesting blog entry on Markov Chains for text generation can be found here: The unreasonable effectiveness of Character-level Language Models

Examples of Markov Chain Model outputs:

4-gram
6-gram
10-gram

m child all / he's prime / baby, but there my family cretin family / everything's rock, rocker / i'll never that you can / stay after go / but your high / and i love here i turn town on now / it's geek / do your temployment / i fell / police day. that i wash you're to walk to don't care / i said ever go to you can't want to the club / you're goes it just can humanking my poor body put my tears i'll all good tired potato turnsey / someo / leave me somehow i gets so be / because i'm nothing in my before complete / well my eyes / make the othere's that can't he law and place / in the to there / we drag / fast ay-o / misfits, twilight zone / i really killing lies hypnotized / twenty-twenty-twenty-twenty-twenty-twenty four hours to go / oohh! / go lil' camaro go / no it's like charles manson / go to be lonely / just pass it by. / when i knew it away from the concert when the time, sharona? / oo younger kind / maybe i wanna be that crime / i want the bbi / i let me walking down the airwaves / we want you up / 'cause she's the end, the corner / sheena is a punk a punk rocker / sheena is a punk rocker / sheena is a punk punk, a punk rocker now / your eye she's that kinda girl / oh baby, do you wanna dance / yeah, yeah, yeah, yeah, there's no success for me / involved in a robberry / there's no reason to live. / i'm the man who make / the street his home. / and my lean, mean heart / i just wanna have to shout it out / i don't have brain damage / i'm not your enemy / girl, i am your friend / come with us and find / the pleasures of a journey to the center of the mirror see your stupid face / what a disgrace man and you know your biggest problem / is the water as high as it'll get / super wet, mom told me, / every chance i have seen / trying to forget / and your friend / come with us and find / the pleasure and pain / i used to be on an endless run. /

Observations

Already with a 10-gram Markov Chain the above example output text contains only a single typo and this with a computational complexity that can be achieved by a regular PC in a tenth of a second (vs. a LSTM-RNN that takes tens of minutes on a dedicated GPU to train).

On the other hand, comparing the generated output to the original (training) text (the lyrics of the 178 songs) it becomes obvious that many lines of the above texts are just "copy-pasted" and rearranged.

Generated MIDI Music

Generated Lyrics

Punkrock Output

Generated Output

Do I believe that we are ready to ditch our MP3 music collections and start listening to computer-generated music?

Probably not!

Do I think that computer-generated "AI" music can be used as an input for creative musicians, as a basis to create good music?

Definitely! And that's what we did...


Hey Ho, Let's Go!


The various LSTM-RNN implementations generated lyrics as well as music in the MIDI format. These were then used by a proper musician as an input to generate proper punk rock songs.

Generated MIDI Output

The various configurations produced several Ramones inspired songs in the MIDI format, ie. the guitar and the bass tracks. The quality of the generated music varied greatly and by adjusting the "diversity/ temperature" of the output-sampling the "variability" of the songs could be adjusted. Three examples that were then used as inspiration to write a punk rock song are shown below:

Three midi songs generated by three LSTM-RNN implementations:

Implementation 1: Implementation 2: Implementation 3:
LSTM-RNN: single layer 128;
weights leading to lowest loss;
Diversity: 0.9
LSTM-RNN: single layer 64;
weights leading to lowest loss
Diversity: 1.1
LSTM-RNN: double layer 128;
weights leading to medium loss
Diversity: 1.1






Download Scoresheet (PDF) Download Scoresheet (PDF) Download Scoresheet (PDF)
Download MIDI Song Download MIDI Song Download MIDI Song





The initial seed for all three songs were the same 2 bars (32 16-th notes) of the Ramones song "Beat on the Brat". Depending on the implementation as well as the "diversity" the songs turned out very differently; the higher the "diversity", the more "chaotic" or "creative" the output becomes. Using weights that did not lead to the lowest loss may produce more "creative" results as well. Many parameters can be adjusted, leading to a wealth of results.

Generated Lyrics Output

The previously described LSTM-RNNs as well as 10-gram Markov Chaines were used to generate lyrics inspired by the Ramones. Initial seeds were taken randomly from Ramones Lyrics. My favorite results (taken from very limited trials) were as follows:

LSTM Neural Network generated: Markov Chain generated:

fight for money, fight for fun
i want you and i want you and you wanna see a hold
i don't wanna be all my baby i hell never was gotta want to be
don't want to be alright

i wanted it a good from me
i don't wanna be all i want
i'm the man who make the street his home.
and my lean, mean heart
i just wanna have to shout it out
i don't have brain damage
i'm not your enemy girl, i am your friend
come with us and find the pleasures
of a journey to the center


The sampling's "diversity" variable of LSTM-RNNs has a great impact on the quality of the output words. The smaller the number, the less typos, but the higher the probability of being "stuck" in a loop. The following examples can illustrate this:

Diversity: 0.1 Diversity: 0.2 Diversity: 0.5

we got to stop this crazy care
i don't wanna be a real me the street the street the time the thing the tale
i don’t wanna be a girl in the street the danger the time
i don't wanna be a place the street the tood the street the
we got to stop this crazy care
i don't wanna be a get the street the threed the street the danger now
i don't wanna be a pink and the tame
and i don't wanna be a good the girl
i am a tio the thing the street the time
i want
we got to stop this crazy care
it's a lot to be on the place
i can't get a lot to me
to die, i was baby, baby
i'm a the friends to poy
i can't let the street to do now
i wanna be a place


The above lyrics and midi-songs were taken as inspiration for the first THE RAiMONES song, "I'm Alive".

Audio Output - Punk Rock Songs!

First song by THE RAiMONES is presented to you here:
More songs can be found on the official THE RAiMONES' Bandcamp site.

(Side-note: the video was a failed attempt of outsourcing through freelancer.com... )



“I’m Alive” is the first Ramones-inspired AI-assisted song to see the light of day. This original punk rock anthem was composed and performed by Mr. Ratboy, whose past “glories” not only include great bands like Sour Jazz, Motorcycle Boy & Pillbox (who toured with The Ramones), but also a stint with Marky Ramone & the Intruders in 1996. The raw material was extracted from the two following parts of the AI generated songs.

Pre-chorus & middle part: Verse & chorus:



The lyrics of the song are a mash-up of the AI-generated lyrics presented above:

I’m alive
No reason to live
Mommy told me
No success for me
Blow every chance I see
But my lean, mean heart
Just wanna shout it out

i'm alive, i'm alive, i'm alive, I'm alive, yeah, yeah, yeah!

I fight for money
I'm not your enemy
I fight for fun
I am your friendy
Your biggest problem is
I don't want to be alright

i'm alive, i'm alive, i'm alive, I'm alive, yeah, yeah, yeah!

Do you wanna dance?
Come on take a chance
Come see your stupid face
In the center of the mirror

i'm alive, i'm alive, i'm alive, I'm alive, yeah, yeah, yeah!
I'm alive!


Also check out THE RAiMONES' Bandcamp official page with additional songs and material.


THE RAiMONES on Twitter

Follow us on Twitter & receive every 6 hours lyrics from THE RAiMONES!
(powered by AWS Lambda; consuming 3'708ms computational power per tweet)

Twitter Now!