0% found this document useful (0 votes)

62 views17 pages

Want To Generate Your Own Music Using Deep Learning? Here's A Guide To Do Just That!

This document provides a guide to generating your own music using deep learning. It discusses using the WaveNet architecture and LSTM networks to build automatic music generation models in Python. It explains key concepts like constituent elements of music, different approaches to music generation including using WaveNet and LSTM, and how to train models for music generation. The document aims to enable readers to develop their own end-to-end music generation model and compose original music without extensive music theory knowledge.

Uploaded by

Isaac VB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views17 pages

Want To Generate Your Own Music Using Deep Learning? Here's A Guide To Do Just That!

Uploaded by

Isaac VB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Want to Generate your own Music using Deep Learning?

Here's a
Guide to do just that!
A D VA NC E D A UD I O A UD I O PRO C E S S I NG D E E P LE A RNI NG E NT E RT A I NM E NT PRO J E C T PYT HO N UNS T RUC T URE D D AT A

Overview

Learn how to develop an end-to-end model for Automatic Music Generation

Understand the WaveNet architecture and implement it from scratch using Keras
Compare the performance of WaveNet versus Long Short Term Memory for building an Automatic
Music Generation model

Introduction

“If I were not a physicist, I would probably be a musician. I often think in music. I live my daydreams in music. I see my life in terms of
music.” – Albert Einstein

I might not be a physicist like Mr. Einstein, but I wholeheartedly agree with his thoughts on music! I can’t
remember a single day when I didn’t open up my music player. My travel to and from office is accompanied
by the tune of music and honestly, it helps me focus on my work.

I’ve always dreamed of composing music but didn’t quite get the hang of instruments. That was until I
came across deep learning. Using certain techniques and frameworks, I was able to compose my own
original music score without really knowing any music theory!

This was one of my favorite professional projects. I combined my two passions – music and deep learning
– to create an automatic music generation model. It’s a dream come true!
I am thrilled to share my approach with you, including the entire code to enable you to generate your own
music! We’ll first quickly understand the concept of automatic music generation before diving into the
different approaches we can use to perform this. Finally, we will fire up Python and design our own
automatic music generation model.

Table of Contents

1. What is Automatic Music Generation?

2. What are the Constituent Elements of Music?
3. Different Approaches to Music Generation
1. Using WaveNet Architecture
2. Using Long Short Term Memory (LSTM)

4. Implementation – Automatic Music Composition using Python

What is Automatic Music Generation?

Music is an Art and a Universal language.

I define music as a collection of tones of different frequencies. So, the Automatic Music Generation is a
process of composing a short piece of music with minimum human intervention.
What could be the simplest form of generating music?

It all started by randomly selecting sounds and combining them to form a piece of music. In 1787, Mozart
proposed a Dice Game for these random sound selections. He composed nearly 272 tones manually! Then,
he selected a tone based on the sum of 2 dice.

Another interesting idea was to make use of musical grammar to generate music.

Musical Grammar comprehends the knowledge necessary to the just arrangement and combination of musical sounds and to the proper
performance of musical compositions
– Foundations of Musical Grammar

In the early 1950s, Iannis Xenakis used the concepts of Statistics and Probability to compose music –
popularly known as Stochastic Music. He defined music as a sequence of elements (or sounds) that
occurs by chance. Hence, he formulated it using stochastic theory. His random selection of elements was
strictly dependent on mathematical concepts.

Recently, Deep Learning architectures have become the state of the art for Automatic Music Generation. In
this article, I will discuss two different approaches for Automatic Music Composition using WaveNet and
LSTM (Long Short Term Memory) architectures.

Note: This article requires a basic understanding of a few deep learning concepts. I recommend going
through the below articles:

A Comprehensive Tutorial to learn Convolutional Neural Networks (CNNs) from Scratch

Essentials of Deep Learning: Introduction to Long Short Term Memory (LSTM)
Must-Read Tutorial to Learn Sequence Modeling

What are the Constituent Elements of Music?

Music is essentially composed of Notes and Chords. Let me explain these terms from the perspective of
the piano instrument:
Note: The sound produced by a single key is called a note
Chords: The sound produced by 2 or more keys simultaneously is called a chord. Generally, most
chords contain at least 3 key sounds
Octave: A repeated pattern is called an octave. Each octave contains 7 white and 5 black keys

Different Approaches to Automatic Music Generation

I will discuss two Deep Learning-based architectures in detail for automatically generating music –
WaveNet and LSTM. But, why only Deep Learning architectures?

Deep Learning is a field of Machine Learning which is inspired by a neural structure. These networks
extract the features automatically from the dataset and are capable of learning any non-linear function.
That’s why Neural Networks are called as Universal Functional Approximators.

Hence, Deep Learning models are the state of the art in various fields like Natural Language Processing
(NLP), Computer Vision, Speech Synthesis and so on. Let’s see how we can build these models for music
composition.

Approach 1: Using WaveNet

WaveNet is a Deep Learning-based generative model for raw audio developed by Google DeepMind.

The main objective of WaveNet is to generate new samples from the original distribution of the data.
Hence, it is known as a Generative Model.

Wavenet is like a language model from NLP.

In a language model, given a sequence of words, the model tries to predict the next word. Similar to a
language model, in WaveNet, given a sequence of samples, it tries to predict the next sample.

Approach 2: Using Long Short Term Memory (LSTM) Model

Long Short Term Memory Model, popularly known as LSTM, is a variant of Recurrent Neural Networks
(RNNs) that is capable of capturing the long term dependencies in the input sequence. LSTM has a wide
range of applications in Sequence-to-Sequence modeling tasks like Speech Recognition, Text
Summarization, Video Classification, and so on.
Let’s discuss in detail how we can train our model using these two approaches.

Wavenet: The Training Phase

This is a Many-to-One problem where the input is a sequence of amplitude values and the output is the subsequent value.

Let’s see how we can prepare input and output sequences.

Input to the WaveNet:

WaveNet takes the chunk of a raw audio wave as an input. Raw audio wave refers to the representation of a
wave in the time series domain.

In the time-series domain, an audio wave is represented in the form of amplitude values which are recorded
at different intervals of time:

Output of the WaveNet:

Given the sequence of the amplitude values, WaveNet tries to predict the successive amplitude value.

Let’s understand this with the help of an example. Consider an audio wave of 5 seconds with a sampling
rate of 16,000 (that is 16,000 samples per second). Now, we have 80,000 samples recorded at different
intervals for 5 seconds. Let’s break the audio into chunks of equal size, say 1024 (which is a
hyperparameter).

The below diagram illustrates the input and output sequences for the model:
Input and Output of first 3 chunks

We can follow a similar procedure for the rest of the chunks.

We can infer from the above that the output of every chunk depends only on the past information ( i.e.
previous timesteps) but not on the future timesteps. Hence, this task is known as Autoregressive task and
the model is known as an Autoregressive model.

Inference phase

In the inference phase, we will try to generate new samples. Let’s see how to do that:

1. Select a random array of sample values as a starting point to model

2. Now, the model outputs the probability distribution over all the samples
3. Choose the value with the maximum probability and append it to an array of samples
4. Delete the first element and pass as an input for the next iteration
5. Repeat steps 2 and 4 for a certain number of iterations

Understanding the WaveNet Architecture

The building blocks of WaveNet are Causal Dilated 1D Convolution layers. Let us first understand the
importance of the related concepts.

Why and What is a Convolution?

One of the main reasons for using convolution is to extract the features from an input.

For example, in the case of image processing, convolving the image with a filter gives us a feature map.
Convolution is a mathematical operation that combines 2 functions. In the case of image processing,
convolution is a linear combination of certain parts of an image with the kernel.

You can browse through the below article to read more about convolution:

Architecture of Convolutional Neural Networks (CNNs) Demystified

What is 1D Convolution?

The objective of 1D convolution is similar to the Long Short Term Memory model. It is used to solve similar
tasks to those of LSTM. In 1D convolution, a kernel or a filter moves along only one direction:

The output of convolution depends upon the size of the kernel, input shape, type of padding, and stride.
Now, I will walk you through different types of padding for understanding the importance of using Dilated
Causal 1D Convolution layers.
When we set the padding valid, the input and output sequences vary in length. The length of an output is
less than an input:

When we set the padding to same, zeroes are padded on either side of the input sequence to make the
length of input and output equal:

Pros of 1D Convolution:

Captures the sequential information present in the input sequence

Training is much faster compared to GRU or LSTM because of the absence of recurrent connections

Cons of 1D Convolution:

When padding is set to the same, output at timestep t is convolved with the previous t-1 and future
timesteps t+1 too. Hence, it violates the Autoregressive principle
When padding is set to valid, input and output sequences vary in length which is required for
computing residual connections (which will be covered later)

This clears the way for the Causal Convolution.

Note: The pros and Cons I mentioned here are specific to this problem.

What is 1D Causal Convolution?

This is defined as convolutions where output at time t is convolved only with elements from time t and earlier in the previous layer.

In simpler terms, normal and causal convolutions differ only in padding. In causal convolution, zeroes are
added to the left of the input sequence to preserve the principle of autoregressive:
Pros of Causal 1D convolution:

Causal convolution does not take into account the future timesteps which is a criterion for building a
Generative model

Cons of Causal 1D convolution:

Causal convolution cannot look back into the past or the timesteps that occurred earlier in the
sequence. Hence, causal convolution has a very low receptive field. The receptive field of a network
refers to the number of inputs influencing an output:

As you can see here, the output is influenced by only 5 inputs. Hence, the Receptive field of the network is
5, which is very low. The receptive field of a network can also be increased by adding kernels of large sizes
but keep in mind that the computational complexity increases.

This drives us to the awesome concept of the Dilated 1D Causal Convolution.

What is Dilated 1D Causal Convolution?

A Causal 1D convolution layer with the holes or spaces in between the values of a kernel is known as Dilated 1D convolution.

The number of spaces to be added is given by the dilation rate. It defines the reception field of a network.
A kernel of size k and dilation rate d has d-1 holes in between every value in kernel k.

As you can see here, convolving a 3 * 3 kernel over a 7 * 7 input with dilation rate 2 has a reception field of
5 * 5.

Pros of Dilated 1D Causal Convolution:

The dilated 1D convolution network increases the receptive field by exponentially increasing the
dilation rate at every hidden layer:

As you can see here, the output is influenced by all the inputs. Hence, the receptive field of the network is
16.

Residual Block of WaveNet:

A building block contains Residual and Skip connections which are just added to speed up the
convergence of the model:

The Workflow of WaveNet:

Input is fed into a causal 1D convolution

The output is then fed to 2 different dilated 1D convolution layers with sigmoid and tanh activations
The element-wise multiplication of 2 different activation values results in a skip connection
And the element-wise addition of a skip connection and output of causal 1D results in the residual

Long Short Term Memory (LSTM) Approach

Another approach for automatic music generation is based on the Long Short Term Memory (LSTM) model.
The preparation of input and output sequences is similar to WaveNet. At each timestep, an amplitude value
is fed into the Long Short Term Memory cell – it then computes the hidden vector and passes it on to the
next timesteps.

The current hidden vector at timestep h t is computed based on the current input a t and previously hidden
vector h t-1 . This is how the sequential information is captured in any Recurrent Neural Network:
Pros of LSTM:

Captures the sequential information present in the input sequence

Cons of LSTM:

It consumes a lot of time for training since it processes the inputs sequentially

Implementation – Automatic Music Generation using Python

The wait is over! Let’s develop an end-to-end model for the automatic generation of music. Fire up your
Jupyter notebooks or Colab (or whichever IDE you prefer).

Download the Dataset:

I downloaded and combined multiple classical music files of a digital piano from numerous resources. You
can download the final dataset from here.

Impor t libraries:

Music 21 is a Python library developed by MIT for understanding music data. MIDI is a standard format for
storing music files. MIDI stands for Musical Instrument Digital Interface. MIDI files contain the
instructions rather than the actual audio. Hence, it occupies very little memory. That’s why it is usually
preferred while transferring files.
#library for understanding music from music21 import *

Reading Musical Files:

Let’s define a function straight away for reading the MIDI files. It returns the array of notes and chords
present in the musical file.

1 #defining function to read MIDI files

2 def read_midi(file):
3
4 print("Loading Music File:",file)
5
6 notes=[]
7 notes_to_parse = None
8
9 #parsing a midi file
10 midi = converter.parse(file)
11
12 #grouping based on different instruments
13 s2 = instrument.partitionByInstrument(midi)
14
15 #Looping over all the instruments
16 for part in s2.parts:
17
18 #select elements of only piano
19 if 'Piano' in str(part):
20
21 notes_to_parse = part.recurse()
22
23 #finding whether a particular element is note or a chord
24 for element in notes_to_parse:
25
26 #note
27 if isinstance(element, note.Note):
28 notes.append(str(element.pitch))
29
30 #chord
31 elif isinstance(element, chord.Chord):
32 notes.append('.'.join(str(n) for n in element.normalOrder))
33
34 return np.array(notes)

view raw
music_1.py hosted with ❤ by GitHub

Now, we will load the MIDI files into our environment

1 #for listing down the file names

2 import os
3
4 #Array Processing
5 import numpy as np
6
7 #specify the path
8 path='schubert/'
9
10 #read all the filenames
11 files=[i for i in os.listdir(path) if i.endswith(".mid")]
12
13 #reading each midi file
14 notes_array = np.array([read_midi(path+i) for i in files])

view raw
music_2.py hosted with ❤ by GitHub

Understanding the data:

Under this section, we will explore the dataset and understand it in detail.

1 #converting 2D array into 1D array

2 notes_ = [element for note_ in notes_array for element in note_]
3
4 #No. of unique notes
5 unique_notes = list(set(notes_))
6 print(len(unique_notes))

view raw
music_3.py hosted with ❤ by GitHub

Output: 304

As you can see here, no. of unique notes is 304. Now, let us see the distribution of the notes.

1 #importing library
2 from collections import Counter
3
4 #computing frequency of each note
5 freq = dict(Counter(notes_))
6
7 #library for visualiation
8 import matplotlib.pyplot as plt
9
10 #consider only the frequencies
11 no=[count for _,count in freq.items()]
12
13 #set the figure size
14 plt.figure(figsize=(5,5))
15
16 #plot
17 plt.hist(no)

view raw
music_4.py hosted with ❤ by GitHub

Output:

From the above plot, we can infer that most of the notes have a very low frequency. So, let us keep the top
frequent notes and ignore the low-frequency ones. Here, I am defining the threshold as 50. Nevertheless,
the parameter can be changed.

frequent_notes = [note_ for note_, count in freq.items() if count>=50] print(len(frequent_notes))

Output: 167

As you can see here, no. of frequently occurring notes is around 170. Now, let us prepare new musical
files which contain only the top frequent notes

1 new_music=[]
2
3 for notes in notes_array:
4 temp=[]
5 for note_ in notes:
6 if note_ in frequent_notes:
7 temp.append(note_)
8 new_music.append(temp)
9
10 new_music = np.array(new_music)

view raw
music_5.py hosted with ❤ by GitHub

Preparing Data:

Preparing the input and output sequences as mentioned in the article:

1 no_of_timesteps = 32
2 x = []
3 y = []
4
5 for note_ in new_music:
6 for i in range(0, len(note_) - no_of_timesteps, 1):
7
8 #preparing input and output sequences
9 input_ = note_[i:i + no_of_timesteps]
10 output = note_[i + no_of_timesteps]
11
12 x.append(input_)
13 y.append(output)
14
15 x=np.array(x)
16 y=np.array(y)

view raw
music_6.py hosted with ❤ by GitHub

Now, we will assign a unique integer to every note:

unique_x = list(set(x.ravel())) x_note_to_int = dict((note_, number) for number, note_ in

enumerate(unique_x))

We will prepare the integer sequences for input data

1 #preparing input sequences

2 x_seq=[]
3 for i in x:
4 temp=[]
5 for j in i:
6 #assigning unique integer to every note
7 temp.append(x_note_to_int[j])
8 x_seq.append(temp)
9
10 x_seq = np.array(x_seq)

view raw
music_7.py hosted with ❤ by GitHub

Similarly, prepare the integer sequences for output data as well

unique_y = list(set(y)) y_note_to_int = dict((note_, number) for number, note_ in enumerate(unique_y))

y_seq=np.array([y_note_to_int[i] for i in y])

Let us preserve 80% of the data for training and the rest 20% for the evaluation:

from sklearn.model_selection import train_test_split x_tr, x_val, y_tr, y_val =

train_test_split(x_seq,y_seq,test_size=0.2,random_state=0)

Model Building

I have defined 2 architectures here – WaveNet and LSTM. Please experiment with both the architectures to
understand the importance of WaveNet architecture.

1 def lstm():
2 model = Sequential()
3 model.add(LSTM(128,return_sequences=True))
4 model.add(LSTM(128))
5 model.add(Dense(256))
6 model.add(Activation('relu'))
7 model.add(Dense(n_vocab))
8 model.add(Activation('softmax'))
9 model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
10 return model

view raw
lstm.py hosted with ❤ by GitHub

I have simplified the architecture of the WaveNet without adding residual and skip connections since the
role of these layers is to improve the faster convergence (and WaveNet takes raw audio wave as input). But
in our case, the input would be a set of nodes and chords since we are generating music:

1 from keras.layers import *

2 from keras.models import *
3 from keras.callbacks import *
4 import keras.backend as K
5
6 K.clear_session()
7 model = Sequential()
8
9 #embedding layer
10 model.add(Embedding(len(unique_x), 100, input_length=32,trainable=True))
11
12 model.add(Conv1D(64,3, padding='causal',activation='relu'))
13 model.add(Dropout(0.2))
14 model.add(MaxPool1D(2))
15
16 model.add(Conv1D(128,3,activation='relu',dilation_rate=2,padding='causal'))
17 model.add(Dropout(0.2))
18 model.add(MaxPool1D(2))
19
20 model.add(Conv1D(256,3,activation='relu',dilation_rate=4,padding='causal'))
21 model.add(Dropout(0.2))
22 model.add(MaxPool1D(2))
23
24 #model.add(Conv1D(256,5,activation='relu'))
25 model.add(GlobalMaxPool1D())
26
27 model.add(Dense(256, activation='relu'))
28 model.add(Dense(len(unique_y), activation='softmax'))
29
30 model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
31
32 model.summary()

view raw
10_8.py hosted with ❤ by GitHub

Define the callback to save the best model during training:

mc=ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True,verbose=1)

Let’s train the model with a batch size of 128 for 50 epochs:

history = model.fit(np.array(x_tr),np.array(y_tr),batch_size=128,epochs=50, validation_data=

(np.array(x_val),np.array(y_val)),verbose=1, callbacks=[mc])

Loading the best model:

#loading best model from keras.models import load_model model = load_model('best_model.h5')

Its time to compose our own music now. We will follow the steps mentioned under the inference phase for
the predictions.

1 import random
2 ind = np.random.randint(0,len(x_val)-1)
3
4 random_music = x_val[ind]
5
6 predictions=[]
7 for i in range(10):
8
9 random_music = random_music.reshape(1,no_of_timesteps)
10
11 prob = model.predict(random_music)[0]
12 y_pred= np.argmax(prob,axis=0)
13 predictions.append(y_pred)
14
15 random_music = np.insert(random_music[0],len(random_music[0]),y_pred)
16 random_music = random_music[1:]
17
18 print(predictions)

view raw
10_9.py hosted with ❤ by GitHub

Now, we will convert the integers back into the notes.

x_int_to_note = dict((number, note_) for number, note_ in enumerate(unique_x)) predicted_notes =

[x_int_to_note[i] for i in predictions]

The final step is to convert back the predictions into a MIDI file. Let’s define the function to accomplish the
task.

1 def convert_to_midi(prediction_output):
2
3 offset = 0
4 output_notes = []
5
6 # create note and chord objects based on the values generated by the model
7 for pattern in prediction_output:
8
9 # pattern is a chord
10 if ('.' in pattern) or pattern.isdigit():
11 notes_in_chord = pattern.split('.')
12 notes = []
13 for current_note in notes_in_chord:
14
15 cn=int(current_note)
16 new_note = note.Note(cn)
17 new_note.storedInstrument = instrument.Piano()
18 notes.append(new_note)
19
20 new_chord = chord.Chord(notes)
21 new_chord.offset = offset
22 output_notes.append(new_chord)
23
24 # pattern is a note
25 else:
26
27 new_note = note.Note(pattern)
28 new_note.offset = offset
29 new_note.storedInstrument = instrument.Piano()
30 output_notes.append(new_note)
31
32 # increase offset each iteration so that notes do not stack
33 offset += 1
34 midi_stream = stream.Stream(output_notes)
35 midi_stream.write('midi', fp='music.mid')

view raw
10_10.py hosted with ❤ by GitHub

Converting the predictions into a musical file:

convert_to_midi(predicted_notes)

Some of the tunes composed by the model:

0:00 / 0:00

0:00 / 0:00
0:00 / 0:00

Awesome, right? But your learning doesn’t stop here. Just remember that we have built a baseline model.
There are plenty of ways to improve the performance of the model even further:

As the size of the training dataset is small, we can fine-tune a pre-trained model to build a robust
system
Collect as much as training data as you can since the deep learning model generalizes well on the
larger datasets

End Notes

Deep Learning has a wide range of applications in our daily life. The key steps in solving any problem are
understanding the problem statement, formulating it and defining the architecture to solve the problem.

I had a lot of fun (and learning) while working on this project. Music is a passion of mine and it was quite
intriguing combining deep learning with that.

I am looking forward to hearing your approach to the problem in the comments section. And if you have
any feedback on this article or any doubts/queries, kindly share them in the comments section below and I
will get back to you.

Article Url - https://www.analyticsvidhya.com/blog/2020/01/how-to-perform-automatic-music-generation/

Aravind Pai
Aravind is a sports fanatic. His passion lies in developing data-driven products for the sports domain.
He strongly believes that analytics in sports can be a game-changer

Music Generation: Using RNN's
No ratings yet
Music Generation: Using RNN's
30 pages
A Style-Specific Music Composition Neural Network
No ratings yet
A Style-Specific Music Composition Neural Network
21 pages
EmotionBox A Music Element Driven Emotio
No ratings yet
EmotionBox A Music Element Driven Emotio
14 pages
1 s2.0 S2352711023000614 Main
No ratings yet
1 s2.0 S2352711023000614 Main
6 pages
Miniproject
No ratings yet
Miniproject
10 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Music Generation With NLP-3
No ratings yet
Music Generation With NLP-3
21 pages
Ben Tal O 41834 AAM
No ratings yet
Ben Tal O 41834 AAM
37 pages
Automatic Music Generation Using Deep Learning: Abstract
No ratings yet
Automatic Music Generation Using Deep Learning: Abstract
12 pages
Music Gen
No ratings yet
Music Gen
17 pages
Chatmusician: Understanding and Generating Music Intrinsically With LLM
No ratings yet
Chatmusician: Understanding and Generating Music Intrinsically With LLM
19 pages
Project Final Document
No ratings yet
Project Final Document
80 pages
ITR Report 200968108
No ratings yet
ITR Report 200968108
21 pages
Music Generation With NLP-4
No ratings yet
Music Generation With NLP-4
12 pages
How To Generate Songs Using Neural Network
No ratings yet
How To Generate Songs Using Neural Network
6 pages
Music Transcription Modelling and Composition Using Deep Learning
100% (1)
Music Transcription Modelling and Composition Using Deep Learning
16 pages
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
No ratings yet
Compose Compute - Computer Generation and Classification of Music Through Operations Research Methods
250 pages
Simple and Controllable Music Generation: : Equal Contributions, : Core Team
No ratings yet
Simple and Controllable Music Generation: : Equal Contributions, : Core Team
15 pages
Music PPT (3.1)
No ratings yet
Music PPT (3.1)
13 pages
Artificial Intelligence in Music Recent Trends and
No ratings yet
Artificial Intelligence in Music Recent Trends and
40 pages
Ji Yang Luo Survey Symbolic Music Generation
No ratings yet
Ji Yang Luo Survey Symbolic Music Generation
39 pages
Developing A Deep Learning Model For Music Generation
No ratings yet
Developing A Deep Learning Model For Music Generation
8 pages
006 Iccc20
No ratings yet
006 Iccc20
6 pages
DeepClassic: Music Generation With Neural Neural Networks
No ratings yet
DeepClassic: Music Generation With Neural Neural Networks
6 pages
Music Transcription Based On Deep Learning
No ratings yet
Music Transcription Based On Deep Learning
4 pages
Music Generation With NLP-1
No ratings yet
Music Generation With NLP-1
15 pages
PACHET - BRIOT - Deeplearningformusicgeneration
No ratings yet
PACHET - BRIOT - Deeplearningformusicgeneration
14 pages
Acapella-Based Music Generation With Sequential Models Utilizing Discrete Cosine Transform
No ratings yet
Acapella-Based Music Generation With Sequential Models Utilizing Discrete Cosine Transform
10 pages
Music RNNN
No ratings yet
Music RNNN
10 pages
Generating Music Using AI: Ebba Rickard
No ratings yet
Generating Music Using AI: Ebba Rickard
66 pages
Divesh AdvAI Report 9NOV2023
No ratings yet
Divesh AdvAI Report 9NOV2023
38 pages
A Survey of AI Music Generation Tools and Models
No ratings yet
A Survey of AI Music Generation Tools and Models
39 pages
Pravesh Khanal Et Al. Be Report Electronics May 2023
No ratings yet
Pravesh Khanal Et Al. Be Report Electronics May 2023
63 pages
Major
No ratings yet
Major
15 pages
RHYTHMIX - LSTM Based Music Synthesizer
No ratings yet
RHYTHMIX - LSTM Based Music Synthesizer
3 pages
Updated IEEE Paper
No ratings yet
Updated IEEE Paper
19 pages
Deep Learning Neural Networks For Music Information Retrieval
No ratings yet
Deep Learning Neural Networks For Music Information Retrieval
4 pages
Project
No ratings yet
Project
25 pages
WaveNet: A Generative Model For Raw Audio
No ratings yet
WaveNet: A Generative Model For Raw Audio
16 pages
PHD Tristan
No ratings yet
PHD Tristan
137 pages
App and Advances
No ratings yet
App and Advances
19 pages
Composing Music With Neural Networks and Probabilistic Finite-State Machines
100% (1)
Composing Music With Neural Networks and Probabilistic Finite-State Machines
6 pages
Copyright Challenges in The Music Industry
No ratings yet
Copyright Challenges in The Music Industry
12 pages
WIMP2017 Martinez-RamirezReiss
No ratings yet
WIMP2017 Martinez-RamirezReiss
4 pages
QMG Vae-2
No ratings yet
QMG Vae-2
3 pages
The Fourier Transform Fabian Cádiz and Rodrigo F. Cádiz
No ratings yet
The Fourier Transform Fabian Cádiz and Rodrigo F. Cádiz
146 pages
Jukebox: A Generative Model For Music
No ratings yet
Jukebox: A Generative Model For Music
20 pages
CHEN 2001 CreatingMelodieswithEvolvingRecurrentNeuralNetworks PDF
No ratings yet
CHEN 2001 CreatingMelodieswithEvolvingRecurrentNeuralNetworks PDF
6 pages
A New Generation of Music-Making Algorithms Is Here
No ratings yet
A New Generation of Music-Making Algorithms Is Here
3 pages
Pradesh DL
No ratings yet
Pradesh DL
9 pages
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
No ratings yet
Stable Convergence and Stable Limit Theorems: Erich Häusler Harald Luschgy
231 pages
Signals and Systems With (MATLAB) Computing and Simulink Modeling - Karris - 5th Edition
100% (1)
Signals and Systems With (MATLAB) Computing and Simulink Modeling - Karris - 5th Edition
68 pages
Schmelzle2010 Fourier Pricing
No ratings yet
Schmelzle2010 Fourier Pricing
86 pages
Generating Musical Sequences With Transformers - 1
No ratings yet
Generating Musical Sequences With Transformers - 1
6 pages
An Introduction To Probability Theory and Its Applications
No ratings yet
An Introduction To Probability Theory and Its Applications
9 pages
Random Vibration and Spectral Analysis: André Preumont
No ratings yet
Random Vibration and Spectral Analysis: André Preumont
8 pages
Music Compostion With Magenta
No ratings yet
Music Compostion With Magenta
2 pages
Addison Wesley - Digital Image Processing, 3rd Edition (Found Via WWW - Emugle.com)
No ratings yet
Addison Wesley - Digital Image Processing, 3rd Edition (Found Via WWW - Emugle.com)
15 pages
Mod 1 Notes 1
No ratings yet
Mod 1 Notes 1
19 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Music Generation Using Recurrent Neural Networks
No ratings yet
Music Generation Using Recurrent Neural Networks
9 pages
Dept Mathematics Vidya A Sharma
No ratings yet
Dept Mathematics Vidya A Sharma
14 pages
Convolutional Neural Networks in Computer Vision: Jochen Lang
No ratings yet
Convolutional Neural Networks in Computer Vision: Jochen Lang
42 pages
2+ijrise 2023 1083
No ratings yet
2+ijrise 2023 1083
3 pages
Ece IV Signals & Systems (10ec44) Notes
No ratings yet
Ece IV Signals & Systems (10ec44) Notes
115 pages
3D Dose Computation Algorithms
0% (1)
3D Dose Computation Algorithms
10 pages
Page 2
No ratings yet
Page 2
43 pages
Applications of Laplace Transform: EEE111 Electric Circuit Analysis
No ratings yet
Applications of Laplace Transform: EEE111 Electric Circuit Analysis
29 pages
Melody Generation Using An Interactive Evolutionary Algorithm
No ratings yet
Melody Generation Using An Interactive Evolutionary Algorithm
6 pages
Article - April 26th Version
No ratings yet
Article - April 26th Version
4 pages
Weighted Inequalities and Uncertainty Principles
No ratings yet
Weighted Inequalities and Uncertainty Principles
42 pages
CS2108 L1 Introduction v8
No ratings yet
CS2108 L1 Introduction v8
25 pages
Distributed Momentum For Byzantine-Resilient Learning: Lian Et Al. 2015 Zhang Et Al. 2016 Dean Et Al. 2012
No ratings yet
Distributed Momentum For Byzantine-Resilient Learning: Lian Et Al. 2015 Zhang Et Al. 2016 Dean Et Al. 2012
20 pages
5 Solved
No ratings yet
5 Solved
8 pages
Laplace Transforms Part II
No ratings yet
Laplace Transforms Part II
24 pages
Signal and System-1
No ratings yet
Signal and System-1
7 pages
Mellin Transform
No ratings yet
Mellin Transform
32 pages
DSP Lectures v2 (Chapter2)
100% (1)
DSP Lectures v2 (Chapter2)
32 pages
Generating Black Metal and Math Rock
No ratings yet
Generating Black Metal and Math Rock
3 pages
Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework For Traffic Forecasting
No ratings yet
Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework For Traffic Forecasting
7 pages
Signals and Systems (SS)
No ratings yet
Signals and Systems (SS)
16 pages
Convolution
No ratings yet
Convolution
5 pages
PHYS 356/452/953 - Medical Imaging Physics Lab Assignment 5
No ratings yet
PHYS 356/452/953 - Medical Imaging Physics Lab Assignment 5
6 pages
Teaching Plan 1718 PDF
No ratings yet
Teaching Plan 1718 PDF
6 pages
EE411L - Digital Signal Processing: Lab 1: Objectives
No ratings yet
EE411L - Digital Signal Processing: Lab 1: Objectives
4 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
cs2403-DIGITAL SIGNAL PROCESSING PDF
No ratings yet
cs2403-DIGITAL SIGNAL PROCESSING PDF
0 pages
Synthesizer Cookbook: How to Use Filters: Sound Design for Beginners, #2
From Everand
Synthesizer Cookbook: How to Use Filters: Sound Design for Beginners, #2
Screech House
3/5 (4)
How To Program Any Synthesizer: Second Edition
From Everand
How To Program Any Synthesizer: Second Edition
Ashley Hewitt
No ratings yet
The Fundamentals of Synthesizer Programming
From Everand
The Fundamentals of Synthesizer Programming
Joseph Akins
1.5/5 (2)

Want To Generate Your Own Music Using Deep Learning? Here's A Guide To Do Just That!

Uploaded by

Want To Generate Your Own Music Using Deep Learning? Here's A Guide To Do Just That!

Uploaded by

Want to Generate your own Music using Deep Learning?

Learn how to develop an end-to-end model for Automatic Music Generation

1. What is Automatic Music Generation?

4. Implementation – Automatic Music Composition using Python

What is Automatic Music Generation?

Music is an Art and a Universal language.

A Comprehensive Tutorial to learn Convolutional Neural Networks (CNNs) from Scratch

What are the Constituent Elements of Music?

Different Approaches to Automatic Music Generation

Approach 1: Using WaveNet

Wavenet is like a language model from NLP.

Approach 2: Using Long Short Term Memory (LSTM) Model

Wavenet: The Training Phase

Let’s see how we can prepare input and output sequences.

Input to the WaveNet:

Output of the WaveNet:

We can follow a similar procedure for the rest of the chunks.

1. Select a random array of sample values as a starting point to model

Understanding the WaveNet Architecture

Why and What is a Convolution?

Architecture of Convolutional Neural Networks (CNNs) Demystified

Captures the sequential information present in the input sequence

This clears the way for the Causal Convolution.

What is 1D Causal Convolution?

Cons of Causal 1D convolution:

This drives us to the awesome concept of the Dilated 1D Causal Convolution.

What is Dilated 1D Causal Convolution?

Pros of Dilated 1D Causal Convolution:

Residual Block of WaveNet:

The Workflow of WaveNet:

Input is fed into a causal 1D convolution

Long Short Term Memory (LSTM) Approach

Captures the sequential information present in the input sequence

Implementation – Automatic Music Generation using Python

Download the Dataset:

Reading Musical Files:

1 #defining function to read MIDI files

Now, we will load the MIDI files into our environment

1 #for listing down the file names

Understanding the data:

1 #converting 2D array into 1D array

frequent_notes = [note_ for note_, count in freq.items() if count>=50] print(len(frequent_notes))

Preparing the input and output sequences as mentioned in the article:

Now, we will assign a unique integer to every note:

unique_x = list(set(x.ravel())) x_note_to_int = dict((note_, number) for number, note_ in

We will prepare the integer sequences for input data

1 #preparing input sequences

Similarly, prepare the integer sequences for output data as well

unique_y = list(set(y)) y_note_to_int = dict((note_, number) for number, note_ in enumerate(unique_y))

from sklearn.model_selection import train_test_split x_tr, x_val, y_tr, y_val =

1 from keras.layers import *

Define the callback to save the best model during training:

mc=ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True,verbose=1)

history = model.fit(np.array(x_tr),np.array(y_tr),batch_size=128,epochs=50, validation_data=

Loading the best model:

#loading best model from keras.models import load_model model = load_model('best_model.h5')

Now, we will convert the integers back into the notes.

x_int_to_note = dict((number, note_) for number, note_ in enumerate(unique_x)) predicted_notes =

[x_int_to_note[i] for i in predictions]

Converting the predictions into a musical file:

Some of the tunes composed by the model:

Article Url - https://www.analyticsvidhya.com/blog/2020/01/how-to-perform-automatic-music-generation/

You might also like