world_synth#
- class diffsptk.WorldSynthesis(frame_period: int, sample_rate: int, fft_length: int, *, default_f0: float = 500, device: device | None = None, dtype: dtype | None = None)[source]#
- See this page for details. Note that the gradients do not propagated through F0. - Parameters:
- frame_periodint >= 1
- The frame period in samples, \(P\). 
- sample_rateint >= 8000
- The sample rate in Hz. 
- fft_lengthint >= 1024
- The number of FFT bins, \(L\). 
- default_f0float > 0
- The F0 value used when the input F0 is unvoiced. 
- devicetorch.device or None
- The device of this module. 
- dtypetorch.dtype or None
- The data type of this module. 
 
 - forward(f0: Tensor, ap: Tensor, sp: Tensor, out_length: int | None = None) Tensor[source]#
- Synthesize speech using WORLD vocoder. - Parameters:
- f0Tensor [shape=(B, T/P) or (T/P,)]
- The F0 in Hz. 
- apTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]
- The aperiodicity in [0, 1]. 
- spTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]
- The spectral envelope (power spectrum). 
- out_lengthint > 0 or None
- The length of the output waveform. 
 
- Returns:
- outTensor [shape=(B, T) or (T,)]
- The synthesized speech waveform. 
 
 - Examples - >>> import diffsptk >>> pitch = diffsptk.Pitch(160, 16000, out_format="f0") >>> aperiodicity = diffsptk.Aperiodicity(160, 16000, 1024) >>> spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 16000, 1024) >>> world_synth = diffsptk.WorldSynthesis(160, 16000, 1024) >>> x = diffsptk.sin(2000 - 1, 80) >>> f0 = pitch(x) >>> ap = aperiodicity(x, f0) >>> sp = spec(x, f0) >>> y = world_synth(f0, ap, sp, out_length=x.size(0)) >>> y.shape torch.Size([2000]) 
 
See also
