pitch_spec#
- class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period: int, sample_rate: int, fft_length: int, algorithm: str = 'cheap-trick', out_format: str | int = 'power', **kwargs)[source]#
See this page for details. Note that the gradients do not propagated through F0.
- Parameters:
- frame_periodint >= 1
The frame period in samples, \(P\).
- sample_rateint >= 8000
The sample rate in Hz.
- fft_lengthint >= 1024
The number of FFT bins, \(L\).
- algorithm[‘cheap-trick’, ‘straight’]
The algorithm to estimate spectral envelpe. The STRAIGHT supports only double precision.
- out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]
The output format.
- default_f0float > 0
The F0 value used when the input F0 is unvoiced.
References
[1]M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.
[2]H. Kawahara et al., “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds”, Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.
- forward(x: Tensor, f0: Tensor) Tensor [source]#
Estimate spectral envelope.
- Parameters:
- xTensor [shape=(…, T)]
The input waveform.
- f0Tensor [shape=(…, T/P)]
The F0 in Hz.
- Returns:
- outTensor [shape=(…, T/P, L/2+1)]
The spectral envelope.
Examples
>>> x = diffsptk.sin(1000, 80) >>> pitch = diffsptk.Pitch(160, 8000, out_format="f0") >>> f0 = pitch(x) >>> f0.shape torch.Size([7]) >>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024) >>> sp = pitch_spec(x, f0) >>> sp.shape torch.Size([7, 513])