pitch_spec#

class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period: int, sample_rate: int, fft_length: int, algorithm: str = 'cheap-trick', out_format: str | int = 'power', **kwargs)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:

frame_periodint >= 1: The frame period in samples, \(P\).
sample_rateint >= 8000: The sample rate in Hz.
fft_lengthint >= 1024: The number of FFT bins, \(L\).
algorithm[‘cheap-trick’, ‘straight’]: The algorithm to estimate spectral envelpe. The STRAIGHT supports only double precision.
out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]: The output format.
default_f0float > 0: The F0 value used when the input F0 is unvoiced.
epsfloat >= 0: A small value added to the power spectrum. Please increase this value if you encounter numerical instability (valid only if algorithm is ‘cheap-trick’).
relative_floorfloat < 0 or None: The relative floor of the power spectrum in dB. Please set this value if you encounter numerical instability (valid only if algorithm is ‘cheap-trick’).
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.
**kwargsadditional keyword arguments: Additional keyword arguments passed to the algorithm-specific extractor.

References

[1]

M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.

[2]

H. Kawahara et al., “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds”, Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.

forward(x: Tensor, f0: Tensor) → Tensor[source]#

Estimate spectral envelope.

Parameters:

xTensor [shape=(…, T)]: The input waveform.
f0Tensor [shape=(…, T/P)]: The F0 in Hz.

Returns:

outTensor [shape=(…, T/P, L/2+1)]: The spectral envelope.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> sp.shape
torch.Size([7, 513])