Christopher Mitcheltree 1, Hao Hao Tan 2, and Joshua D. Reiss 1
1 Centre for Digital Music, Queen Mary University of London, UK
2 Independent Researcher, Singapore
Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators, and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.
Figure 1: Overview of the modulation discovery process through modulation extraction, parameterization, and routing using a DDSP synth. Orange blocks are neural networks, dashed blocks are optional, and blue blocks are differentiable and may contain learnable weights for sound matching.
Figure 2: Synthetic visualization of modulation signals being discovered during training of LFO-net and Mod. Synth using the piecewise 2D Bézier curve (spline) parameterization. Note that due to the learnable synth modules and stochastic optimization process, discovered modulations (orange) can be different from the corresponding ground truth modulations (black), but still produce perceptually similar audio.
Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Lake Tahoe, CA, USA, 12 - 15 October 2025.
@inproceedings{mitcheltree2025modulation,
title={Modulation Discovery with Differentiable Digital Signal Processing},
author={Christopher Mitcheltree and Hao Hao Tan and Joshua D. Reiss},
booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
year={2025}
}
Below, we provide listening examples and visualizations for the three experiments in the paper. All tables can be randomized by clicking the button below them which will display a random example from the test results for the given experiment.
The first experiment evaluates how well LFO-net and the three parameterization methods can extract modulation signals when a white-box synth architecture is used.
Audio is generated and reconstructed with a frozen Mod. Synth (see Section 2.1 in the paper for more information), which means extracted modulation signals can be compared directly to their ground truth counterparts.
Ableton wavetable presets used:
Method | Spectrogram | Extracted Additive Modulation Signal |
Extracted Subtractive Modulation Signal |
Extracted Envelope Modulation Signal |
Audio |
---|---|---|---|---|---|
Target | ![]() |
- | - | - | |
Frame | ![]() |
||||
LPF | ![]() |
||||
Spline | ![]() |
||||
Random Spline |
![]() |
Current example: wavetable index = 2 / 10, batch index = 17 / 20
Table scrolls horizontally if space is limited.
This second table shows the results for the unseen, real-world test dataset made from Vital's default preset library modulation curves. Unseen, real-world, ground-truth modulation signals used to make the target audio are dashed and black, extracted modulation signals are solid and red for additive, blue for subtractive, and orange for envelope synth modules.
Method | Spectrogram | Extracted Additive Modulation Signal |
Extracted Subtractive Modulation Signal |
Extracted Envelope Modulation Signal |
Audio |
---|---|---|---|---|---|
Target | ![]() |
- | - | - | |
Frame | ![]() |
||||
LPF | ![]() |
||||
Spline | ![]() |
||||
Random Spline |
![]() |
Current example: wavetable index = 6 / 10, batch index = 1 / 20
Table scrolls horizontally if space is limited.
The second experiment evaluates how well our modulation routing and DDSP sound matching approach can discover modulations for a gray-box synth.
The same Ableton wavetables as listed above in Experiment 1 are used.
Synthetic, ground-truth modulation signals used to make the target audio are dashed and black, discovered modulation signals are solid when using LLS 3 and dotted when using LLS 1 post-processing steps (see Section 3.2 in the paper for more information).
Discovered modulation signals are red for additive, blue for subtractive, and orange for envelope synth modules.
Method | Spectrogram | Discovered Additive Modulation Signal |
Discovered Subtractive Modulation Signal |
Discovered Envelope Modulation Signal |
Audio |
---|---|---|---|---|---|
Target | ![]() |
- | - | - | |
Frame | ![]() |
||||
LPF | ![]() |
||||
Spline | ![]() |
||||
Oracle | ![]() |
||||
Random Spline |
![]() |
Current example: wavetable index = 10 / 10, batch index = 6 / 20
Table scrolls horizontally if space is limited.
The last experiment evaluates how well our modulation discovery approach generalizes to real-world audio, black-box synths, and different DDSP synth architectures and their modulation routing.
Serum presets (from the "Bass (Hard)" category) used:
Synth & Method |
Spectrogram | Discovered Modulation Signals |
RMS Loudness | Spectral Flatness | Audio |
---|---|---|---|---|---|
Target | ![]() |
- | - | - | |
Mod. Synth Granular |
![]() |
- | - | - | |
Mod. Synth Frame |
![]() |
||||
Mod. Synth LPF |
![]() |
||||
Mod. Synth Spline |
![]() |
||||
Mod. Synth Random Spline |
![]() |
||||
Target | ![]() |
- | - | - | |
Shan et al. Granular |
![]() |
- | - | - | |
Shan et al. Frame |
![]() |
||||
Shan et al. LPF |
![]() |
||||
Shan et al. Spline |
![]() |
||||
Shan et al. Random Spline |
![]() |
||||
Target | ![]() |
- | - | - | |
Engel et al. Granular |
![]() |
- | - | - | |
Engel et al. Frame |
![]() |
||||
Engel et al. LPF |
![]() |
||||
Engel et al. Spline |
![]() |
||||
Engel et al. Random Spline |
![]() |
Current example: 71 / 176
Table scrolls horizontally if space is limited.
Coming soon.