thesis

Digital appendix

This webpage is a digital appendix to the written master thesis. Its function is to host audio files to accompany the results. The titles in the digital appendix correspond to the titles in the thesis. The colors of the waveforms match the colors of their associated models in the figures in the report.

It is advised to listen to the samples with decent headphones or loudspeakers, as the evaluation of the results largely depends on the nuances in frequency content.

Some of the audio files are used in multiple results because the relevant hyperparameters overlap.

Due the varying behavior of the neural network throughout some of these test cases, some of the samples are quite distorted. Watch out for an alert sign (⚠️) and lower your system volume before playing back these samples.

0 - Input audio

The following samples were used as source and target.

0.1 - Source: noise.wav

0.2 - Target: amen_drum_break.wav

1 - Comparing Soft Actor-Critic and Proximal Policy Optimization

1.1 - Proximal Policy Optimization

1.2 - Soft Actor-Critic

2 - Feature selection

2.1 - Model A

Feature extractors: [RMS]

2.2 - Model B

Feature extractors: [RMS, pitch, spectral centroid, spectral spread, spectral flatness, spectral flux]

3 - Balancing the exploration temperature

3.1 - Model C: target entropy = -3

3.2 - Model D: target entropy = -6 ⚠️

3.3 - Model E: target entropy = -12 ⚠️

3.4 - Model F: target entropy = -24

3.5 - Model G: target entropy = -48 ⚠️

4 - Reward function design

4.1 - Inverse scale

4.2 - Relative gain

4.3 - Mixed ⚠️

5 - Real-time performance

5.1 - Non-real-time inference

5.2 - Real-time inference ⚠️⚠️⚠️

NB! This sample is very distorted

6 - Generalizability to new sounds

The two new sounds:

6.1 - drum_beat_80s.wav

6.2 - arp_sequence.wav

6.3 - Experiment 1: changing the target

6.4 - Experiment 2: changing the source

6.5 - Experiment 3: changing both the source and the target