pitch_spec#

class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period, sample_rate, fft_length, algorithm='cheap-trick', out_format='power', **kwargs)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:

frame_periodint >= 1: The frame period in samples, \(P\).
sample_rateint >= 8000: The sample rate in Hz.
fft_lengthint >= 1024: The number of FFT bins, \(L\).
algorithm[‘cheap-trick’, ‘straight’]: The algorithm to estimate spectral envelpe. The STRAIGHT supports only double precision.
out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]: The output format.
default_f0float > 0: The F0 value used when the input F0 is unvoiced.

References

[1]

M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.

[2]

H. Kawahara et al., “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds”, Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.

forward(x, f0)[source]#

Estimate spectral envelope.

Parameters:

xTensor [shape=(…, T)]: The input waveform.
f0Tensor [shape=(…, T/P)]: The F0 in Hz.

Returns:

outTensor [shape=(…, T/P, L/2+1)]: The spectral envelope.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> sp.shape
torch.Size([7, 513])

pitch_spec#

This Page