- class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period, sample_rate, fft_length, out_format='power', q1=-0.15, default_f0=500)[source]#
See this page for details.
- Parameters:
- frame_periodint >= 1
Frame period in sample, \(P\).
- sample_rateint >= 8000
Sample rate in Hz.
- fft_lengthint >= 1024
Number of FFT bins, \(L\).
- out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]
Output format.
- q1float
A parameter used for spectral recovery.
- default_f0float > 0
F0 value used when the input F0 is unvoiced.
[1]M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.
- forward(x, f0)[source]#
Estimate spectral envelope.
- Parameters:
- xTensor [shape=(…, T)]
- f0Tensor [shape=(…, T/P)]
F0 in Hz.
- Returns:
- outTensor [shape=(…, T/P, L/2+1)]
Spectral envelope.
>>> x = diffsptk.sin(1000, 80) >>> pitch = diffsptk.Pitch(160, 8000, out_format="f0") >>> f0 = pitch(x) >>> f0.shape torch.Size([7]) >>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024) >>> sp = pitch_spec(x, f0) >>> sp.shape torch.Size([7, 513])