pitch_spec#

class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period, sample_rate, fft_length, out_format='power', q1=-0.15, default_f0=500)[source]#

See this page for details.

Parameters:

frame_periodint >= 1: Frame period in sample, \(P\).
sample_rateint >= 8000: Sample rate in Hz.
fft_lengthint >= 1024: Number of FFT bins, \(L\).
out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]: Output format.
q1float: A parameter used for spectral recovery.
default_f0float > 0: F0 value used when the input F0 is unvoiced.

References

[1]

M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.

forward(x, f0)[source]#

Estimate spectral envelope.

Parameters:

xTensor [shape=(…, T)]: Waveform.
f0Tensor [shape=(…, T/P)]: F0 in Hz.

Returns:

outTensor [shape=(…, T/P, L/2+1)]: Spectral envelope.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> sp.shape
torch.Size([7, 513])

pitch_spec#

This Page