This webpage is a digital appendix to the written master thesis. Its function is to host audio files to accompany the results. The titles in the digital appendix correspond to the titles in the thesis. The colors of the waveforms match the colors of their associated models in the figures in the report.
It is advised to listen to the samples with decent headphones or loudspeakers, as the evaluation of the results largely depends on the nuances in frequency content.
Some of the audio files are used in multiple results because the relevant hyperparameters overlap.
Due the varying behavior of the neural network throughout some of these test cases, some of the samples are quite distorted. Watch out for an alert sign (⚠️) and lower your system volume before playing back these samples.
The following samples were used as source and target.
0.1 - Source: noise.wav
0.2 - Target: amen_drum_break.wav
1.1 - Proximal Policy Optimization
1.2 - Soft Actor-Critic
2.1 - Model A
Feature extractors: [RMS]
2.2 - Model B
Feature extractors: [RMS, pitch, spectral centroid, spectral spread, spectral flatness, spectral flux]
3.1 - Model C: target entropy = -3
3.2 - Model D: target entropy = -6
⚠️
3.3 - Model E: target entropy = -12
⚠️
3.4 - Model F: target entropy = -24
3.5 - Model G: target entropy = -48
⚠️
4.1 - Inverse scale
4.2 - Relative gain
4.3 - Mixed ⚠️
5.1 - Non-real-time inference
5.2 - Real-time inference ⚠️⚠️⚠️
NB! This sample is very distorted
The two new sounds:
6.1 - drum_beat_80s.wav
6.2 - arp_sequence.wav
6.3 - Experiment 1: changing the target
noise.wav
drum_beat_80s.wav
6.4 - Experiment 2: changing the source
arp_sequence.wav
amen_drum_break.wav
6.5 - Experiment 3: changing both the source and the target
arp_sequence.wav
drum_beat_80s.wav