This repository contains the code for the project of the Big Data Computing course at the University of La Sapienza, Rome. For this course it was required to implement a Big Data application using the Apache Spark framework into a Google Colab notebook.
For this project I decided to implement a deep learning application able to synthesize audio of an instrument given a sequence of annotated notes (audio + midi information). The application is based on a Pix2Pix architecture, which is a conditional GAN able to learn a mapping from input to output images. In this case the input is spectrogram of sawtooth/sinewave audio synthesized from midi information and the output is the spectrogram of the real audio of the instrument.
The dataset used for this project is the NSynth Dataset, which contains 305,979 musical notes from 1,006 instruments annotated with midi information such as pitch and velocity.
The application is implemented using the following libraries:
- Apache Spark: for the distributed computation of the training and testing of the model
- Petastorm: for the distributed data loading
- Pytorch Lightning: for the implementation of the training and testing loops
- Pythorch Lightning Bolts: for the implementation of the Pix2Pix model
- Librosa: for the audio processing
- Matplotlib: for the plotting of the results
- Frechet Audio Distance: for the computation of the Frechet Audio Distance (FAD) between the real and the generated audio