world_synth#

class diffsptk.WorldSynthesis(frame_period: int, sample_rate: int, fft_length: int, *, default_f0: float = 500, device: device | None = None, dtype: dtype | None = None)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:

frame_periodint >= 1: The frame period in samples, \(P\).
sample_rateint >= 8000: The sample rate in Hz.
fft_lengthint >= 1024: The number of FFT bins, \(L\).
default_f0float > 0: The F0 value used when the input F0 is unvoiced.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

forward(f0: Tensor, ap: Tensor, sp: Tensor) → Tensor[source]#

Synthesize speech using WORLD vocoder.

Parameters:

f0Tensor [shape=(B, T/P) or (T/P,)]: The F0 in Hz.
apTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]: The aperiodicity in [0, 1].
spTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]: The spectral envelope (power spectrum).

Returns:

outTensor [shape=(B, T) or (T,)]: The synthesized speech waveform.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> aperiodicity = diffsptk.Aperiodicity(160, 16000, 1024)
>>> ap = aperiodicity(x, f0)
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> world_synth = diffsptk.WorldSynthesis(160, 8000, 1024)
>>> y = world_synth(f0, ap, sp)
>>> y.shape
torch.Size([1120])