world_synth#

class diffsptk.WorldSynthesis(frame_period: int, sample_rate: int, fft_length: int, *, default_f0: float = 500)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:
frame_periodint >= 1

The frame period in samples, \(P\).

sample_rateint >= 8000

The sample rate in Hz.

fft_lengthint >= 1024

The number of FFT bins, \(L\).

default_f0float > 0

The F0 value used when the input F0 is unvoiced.

forward(f0: Tensor, ap: Tensor, sp: Tensor) Tensor[source]#

Synthesize speech using WORLD vocoder.

Parameters:
f0Tensor [shape=(B, T/P) or (T/P,)]

The F0 in Hz.

apTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]

The aperiodicity in [0, 1].

spTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]

The spectral envelope (power spectrum).

Returns:
outTensor [shape=(B, T) or (T,)]

The synthesized speech waveform.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> aperiodicity = diffsptk.Aperiodicity(160, 16000, 1024)
>>> ap = aperiodicity(x, f0)
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> world_synth = diffsptk.WorldSynthesis(160, 8000, 1024)
>>> y = world_synth(f0, ap, sp)
>>> y.shape
torch.Size([1120])

See also

ap pitch pitch_spec