Modulation Discovery with Differentiable Digital Signal Processing

Christopher Mitcheltree 1, Hao Hao Tan 2, and Joshua D. Reiss 1

1 Centre for Digital Music, Queen Mary University of London, UK
2 Independent Researcher, Singapore

Paper Code Plugins

Abstract


Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators, and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.




Figure 1: Overview of the modulation discovery process through modulation extraction, parameterization, and routing using a DDSP synth. Orange blocks are neural networks, dashed blocks are optional, and blue blocks are differentiable and may contain learnable weights for sound matching.

Figure 2: Synthetic visualization of modulation signals being discovered during training of LFO-net and Mod. Synth using the piecewise 2D Bézier curve (spline) parameterization. Note that due to the learnable synth modules and stochastic optimization process, discovered modulations (orange) can be different from the corresponding ground truth modulations (black), but still produce perceptually similar audio.

Citation


Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Lake Tahoe, CA, USA, 12 - 15 October 2025.


  @inproceedings{mitcheltree2025modulation,
      title={Modulation Discovery with Differentiable Digital Signal Processing},
      author={Christopher Mitcheltree and Hao Hao Tan and Joshua D. Reiss},
      booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
      year={2025}
  }
  

Listening Samples and Modulation Signal Visualizations


Below, we provide listening examples and visualizations for the three experiments in the paper. All tables can be randomized by clicking the button below them which will display a random example from the test results for the given experiment.

Experiment 1: Modulation Extraction (Synthetic and Real-world Data, White-box Synth)


The first experiment evaluates how well LFO-net and the three parameterization methods can extract modulation signals when a white-box synth architecture is used. Audio is generated and reconstructed with a frozen Mod. Synth (see Section 2.1 in the paper for more information), which means extracted modulation signals can be compared directly to their ground truth counterparts.

Ableton wavetable presets used:

  1. Basics: FM Fold (78 positions, 1024 samples)
  2. Basics: Galactica (4 positions, 1024 samples)
  3. Basics: Harmonic Series (7 positions, 1024 samples)
  4. Basics: Sub 3 (122 positions, 1024 samples)
  5. Collection: Aureolin (256 positions, 1024 samples)
  6. Collection: Squash (32 positions, 1024 samples)
  7. Complex: Bit Ring (256 positions, 1024 samples)
  8. Complex: Kicked (4 positions, 1024 samples)
  9. Distortion: DP Fold (230 positions, 1024 samples)
  10. Distortion: Phased (178 positions, 1024 samples)
The first table shows the results for the test split of the synthetic dataset the models and synths are trained on. Synthetic, ground-truth modulation signals used to make the target audio are dashed and black, extracted modulation signals are solid and red for additive, blue for subtractive, and orange for envelope synth modules.

Method Spectrogram Extracted Additive
Modulation Signal
Extracted Subtractive
Modulation Signal
Extracted Envelope
Modulation Signal
Audio
Target - - -
Frame
LPF
Spline
Random
Spline

Current example: wavetable index = 2 / 10, batch index = 17 / 20

Table scrolls horizontally if space is limited.

This second table shows the results for the unseen, real-world test dataset made from Vital's default preset library modulation curves. Unseen, real-world, ground-truth modulation signals used to make the target audio are dashed and black, extracted modulation signals are solid and red for additive, blue for subtractive, and orange for envelope synth modules.

Method Spectrogram Extracted Additive
Modulation Signal
Extracted Subtractive
Modulation Signal
Extracted Envelope
Modulation Signal
Audio
Target - - -
Frame
LPF
Spline
Random
Spline

Current example: wavetable index = 6 / 10, batch index = 1 / 20

Table scrolls horizontally if space is limited.

Experiment 2: Modulation Discovery (Synthetic Data, Gray-box Synth)


The second experiment evaluates how well our modulation routing and DDSP sound matching approach can discover modulations for a gray-box synth. The same Ableton wavetables as listed above in Experiment 1 are used.

Synthetic, ground-truth modulation signals used to make the target audio are dashed and black, discovered modulation signals are solid when using LLS 3 and dotted when using LLS 1 post-processing steps (see Section 3.2 in the paper for more information). Discovered modulation signals are red for additive, blue for subtractive, and orange for envelope synth modules.

Method Spectrogram Discovered Additive
Modulation Signal
Discovered Subtractive
Modulation Signal
Discovered Envelope
Modulation Signal
Audio
Target - - -
Frame
LPF
Spline
Oracle
Random
Spline

Current example: wavetable index = 10 / 10, batch index = 6 / 20

Table scrolls horizontally if space is limited.

Experiment 3: Modulation Discovery (Real-world Data, Black-box Synth)


The last experiment evaluates how well our modulation discovery approach generalizes to real-world audio, black-box synths, and different DDSP synth architectures and their modulation routing.

Serum presets (from the "Bass (Hard)" category) used:

  1. BA Access 2 Mthrshp Denied
  2. BA BitterBot
  3. BA Deth reece
  4. BA Gritter
  5. BA Hoo
  6. BA Le Gigante
  7. BA Modulated Chomper
  8. BA SCREAM Wobble 01
  9. BA Sludgecrank
  10. BA Wide Eyed Reese
Discovered modulation signals are red for additive, blue for subtractive, and orange for envelope synth modules. RMS loudness and spectral flatness proxy modulation signals of the target audio are dashed and black, and the corresponding LLS 3 post-processed discovered modulation signals are solid and magenta. To improve the visualizations and remove windowing artifacts, the proxy modulation signals are low-pass filtered and trimmed by 32 frames before plotting.

Synth &
Method
Spectrogram Discovered
Modulation Signals
RMS Loudness Spectral Flatness Audio
Target - - -
Mod. Synth

Granular
- - -
Mod. Synth

Frame
Mod. Synth

LPF
Mod. Synth

Spline
Mod. Synth

Random
Spline
Target - - -
Shan et al.

Granular
- - -
Shan et al.

Frame
Shan et al.

LPF
Shan et al.

Spline
Shan et al.

Random
Spline
Target - - -
Engel et al.

Granular
- - -
Engel et al.

Frame
Engel et al.

LPF
Engel et al.

Spline
Engel et al.

Random
Spline

Current example: 71 / 176

Table scrolls horizontally if space is limited.

DDSP Synth VST Plugins


Coming soon.