Modulation Discovery with Differentiable Digital Signal Processing

Abstract

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators, and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.

Figure 1: Overview of the modulation discovery process through modulation extraction, parameterization, and routing using a DDSP synth. Orange blocks are neural networks, dashed blocks are optional, and blue blocks are differentiable and may contain learnable weights for sound matching.

Figure 2: Synthetic visualization of modulation signals being discovered during training of LFO-net and Mod. Synth using the piecewise 2D Bézier curve (spline) parameterization. Note that due to the learnable synth modules and stochastic optimization process, discovered modulations (orange) can be different from the corresponding ground truth modulations (black), but still produce perceptually similar audio.

Citation

Accepted to the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Lake Tahoe, CA, USA, 12 - 15 October 2025 (best paper candidate).


  @inproceedings{mitcheltree2025modulation,
      title={Modulation Discovery with Differentiable Digital Signal Processing},
      author={Christopher Mitcheltree and Hao Hao Tan and Joshua D. Reiss},
      booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
      year={2025}
  }

Listening Samples and Modulation Signal Visualizations

Below, we provide listening examples and visualizations for the three experiments in the paper. All tables can be randomized by clicking the button below them which will display a random example from the test results for the given experiment.

Experiment 1: Modulation Extraction (Synthetic and Real-world Data, White-box Synth)

The first experiment evaluates how well LFO-net and the three parameterization methods can extract modulation signals when a white-box synth architecture is used. Audio is generated and reconstructed with a frozen Mod. Synth (see Section 2.1 in the paper for more information), which means extracted modulation signals can be compared directly to their ground truth counterparts.

Ableton wavetable presets used:

Basics: FM Fold (78 positions, 1024 samples)
Basics: Galactica (4 positions, 1024 samples)
Basics: Harmonic Series (7 positions, 1024 samples)
Basics: Sub 3 (122 positions, 1024 samples)
Collection: Aureolin (256 positions, 1024 samples)
Collection: Squash (32 positions, 1024 samples)
Complex: Bit Ring (256 positions, 1024 samples)
Complex: Kicked (4 positions, 1024 samples)
Distortion: DP Fold (230 positions, 1024 samples)
Distortion: Phased (178 positions, 1024 samples)

The first table shows the results for the test split of the synthetic dataset the models and synths are trained on. Synthetic, ground-truth modulation signals used to make the target audio are dashed and black, extracted modulation signals are solid and red for additive, blue for subtractive, and orange for envelope synth modules.

Method	Extracted Additive Modulation Signal	Extracted Subtractive Modulation Signal	Extracted Envelope Modulation Signal
Target	-	-	-
Frame
LPF
Spline
Random Spline

Current example: wavetable index = 2 / 10, batch index = 17 / 20

Table scrolls horizontally if space is limited.

This second table shows the results for the unseen, real-world test dataset made from Vital's default preset library modulation curves. Unseen, real-world, ground-truth modulation signals used to make the target audio are dashed and black, extracted modulation signals are solid and red for additive, blue for subtractive, and orange for envelope synth modules.

Method	Extracted Additive Modulation Signal	Extracted Subtractive Modulation Signal	Extracted Envelope Modulation Signal
Target	-	-	-
Frame
LPF
Spline
Random Spline

Current example: wavetable index = 6 / 10, batch index = 1 / 20

Table scrolls horizontally if space is limited.

Experiment 2: Modulation Discovery (Synthetic Data, Gray-box Synth)

The second experiment evaluates how well our modulation routing and DDSP sound matching approach can discover modulations for a gray-box synth. The same Ableton wavetables as listed above in Experiment 1 are used.

Synthetic, ground-truth modulation signals used to make the target audio are dashed and black, discovered modulation signals are solid when using LLS 3 and dotted when using LLS 1 post-processing steps (see Section 3.2 in the paper for more information). Discovered modulation signals are red for additive, blue for subtractive, and orange for envelope synth modules.

Method	Discovered Additive Modulation Signal	Discovered Subtractive Modulation Signal	Discovered Envelope Modulation Signal
Target	-	-	-
Frame
LPF
Spline
Oracle
Random Spline

Current example: wavetable index = 10 / 10, batch index = 6 / 20

Table scrolls horizontally if space is limited.

Experiment 3: Modulation Discovery (Real-world Data, Black-box Synth)

The last experiment evaluates how well our modulation discovery approach generalizes to real-world audio, black-box synths, and different DDSP synth architectures and their modulation routing.

Serum presets (from the "Bass (Hard)" category) used:

BA Access 2 Mthrshp Denied
BA BitterBot
BA Deth reece
BA Gritter
BA Hoo
BA Le Gigante
BA Modulated Chomper
BA SCREAM Wobble 01
BA Sludgecrank
BA Wide Eyed Reese

Discovered modulation signals are red for additive, blue for subtractive, and orange for envelope synth modules. RMS loudness and spectral flatness proxy modulation signals of the target audio are dashed and black, and the corresponding LLS 3 post-processed discovered modulation signals are solid and magenta. To improve the visualizations and remove windowing artifacts, the proxy modulation signals are low-pass filtered and trimmed by 32 frames before plotting.

Synth & Method	Discovered Modulation Signals	RMS Loudness	Spectral Flatness
Target	-	-	-
Mod. Synth Granular	-	-	-
Mod. Synth Frame
Mod. Synth LPF
Mod. Synth Spline
Mod. Synth Random Spline
Target	-	-	-
Shan et al. Granular	-	-	-
Shan et al. Frame
Shan et al. LPF
Shan et al. Spline
Shan et al. Random Spline
Target	-	-	-
Engel et al. Granular	-	-	-
Engel et al. Frame
Engel et al. LPF
Engel et al. Spline
Engel et al. Random Spline

Current example: 71 / 176

Table scrolls horizontally if space is limited.

DDSP Synth VST Plugins

Figure 4: The free Neutone FX host plugin user interface.

We make the trained DDSP synths accessible using the open source Neutone SDK and free Neutone host plugins. This enables readers to experiment with the synths and evaluate how they sound themselves via a real-time VST plugin in their preferred digital audio workstation (DAW) on arbitrary input audio. Older CPUs may struggle to run the synths in real time.

Instructions

Download and install the free Neutone FX plugin.
Download a Neutone SDK wrapped and exported synth file from the tables below.
Open the Neutone FX plugin in your preferred digital audio workstation.
Click on "load your own" at the top of the Neutone FX plugin interface and select one of the synth files you just downloaded.
Use the four custom knobs to control the synth.
We recommend setting your DAW sampling rate to 48 kHz, buffer size to greater than 256 samples, and using an M1 Pro MacBook or better for best results.

Control Knobs

Experiment 1: Modulation Extraction Synths

Knob A: Oscillator pitch (F#1 to C4)
Knob B: Wavetable position
Knob C: Filter cutoff frequency (100 Hz to 8000 Hz)
Knob D: Filter resonance Q-factor (0.7071 to 4.0)

Experiments 2 and 3: Modulation Discovery Synths

Knob A: Oscillator pitch (F#1 to C4)
Knob B: Additive modulation signal (wavetable position)
Knob C: Subtractive modulation signal (filter coefficients)
Knob D: Envelope modulation signal

Experiment 1: modulation extraction (synthetic and real-world data, white-box synth) model files for Neutone FX.

Wavetable Name	Synth
Basics: FM Fold	download
Basics: Galactica	download
Basics: Harmonic Series	download
Basics: Sub 3	download
Collection: Aureolin	download
Collection: Squash	download
Complex: Bit Ring	download
Complex: Kicked	download
Distortion: DP Fold	download
Distortion: Phased	download

Experiment 2: modulation discovery (synthetic data, gray-box synth) model files for Neutone FX.

Wavetable Name	Frame	LPF	Spline	Oracle
Basics: FM Fold	download	download	download	download
Basics: Galactica	download	download	download	download
Basics: Harmonic Series	download	download	download	download
Basics: Sub 3	download	download	download	download
Collection: Aureolin	download	download	download	download
Collection: Squash	download	download	download	download
Complex: Bit Ring	download	download	download	download
Complex: Kicked	download	download	download	download
Distortion: DP Fold	download	download	download	download
Distortion: Phased	download	download	download	download

Experiment 3: modulation discovery (real-world data, black-box synth) model files for Neutone FX.

Synth Architecture	Frame	LPF	Spline
Mod. Synth	download	download	download