world_synth#
- class diffsptk.WorldSynthesis(frame_period: int, sample_rate: int, fft_length: int, *, default_f0: float = 500)[source]#
See this page for details. Note that the gradients do not propagated through F0.
- Parameters:
- frame_periodint >= 1
The frame period in samples, \(P\).
- sample_rateint >= 8000
The sample rate in Hz.
- fft_lengthint >= 1024
The number of FFT bins, \(L\).
- default_f0float > 0
The F0 value used when the input F0 is unvoiced.
- forward(f0: Tensor, ap: Tensor, sp: Tensor) Tensor [source]#
Synthesize speech using WORLD vocoder.
- Parameters:
- f0Tensor [shape=(B, T/P) or (T/P,)]
The F0 in Hz.
- apTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]
The aperiodicity in [0, 1].
- spTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]
The spectral envelope (power spectrum).
- Returns:
- outTensor [shape=(B, T) or (T,)]
The synthesized speech waveform.
Examples
>>> x = diffsptk.sin(1000, 80) >>> pitch = diffsptk.Pitch(160, 8000, out_format="f0") >>> f0 = pitch(x) >>> aperiodicity = diffsptk.Aperiodicity(160, 16000, 1024) >>> ap = aperiodicity(x, f0) >>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024) >>> sp = pitch_spec(x, f0) >>> world_synth = diffsptk.WorldSynthesis(160, 8000, 1024) >>> y = world_synth(f0, ap, sp) >>> y.shape torch.Size([1120])
See also