pitch_spec#

class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period, sample_rate, fft_length, algorithm='cheap-trick', out_format='power', **kwargs)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:
frame_periodint >= 1

The frame period in samples, \(P\).

sample_rateint >= 8000

The sample rate in Hz.

fft_lengthint >= 1024

The number of FFT bins, \(L\).

algorithm[‘cheap-trick’, ‘straight’]

The algorithm to estimate spectral envelpe. The STRAIGHT supports only double precision.

out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]

The output format.

default_f0float > 0

The F0 value used when the input F0 is unvoiced.

References

[1]

M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.

[2]

H. Kawahara et al., “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds”, Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.

forward(x, f0)[source]#

Estimate spectral envelope.

Parameters:
xTensor [shape=(…, T)]

The input waveform.

f0Tensor [shape=(…, T/P)]

The F0 in Hz.

Returns:
outTensor [shape=(…, T/P, L/2+1)]

The spectral envelope.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> sp.shape
torch.Size([7, 513])

See also

spec ap pitch