pitch_spec#

class diffsptk.PitchAdaptiveSpectralAnalysis(frame_period, sample_rate, fft_length, out_format='power', q1=-0.15, default_f0=500)[source]#

See this page for details.

Parameters:
frame_periodint >= 1

Frame period in sample, \(P\).

sample_rateint >= 8000

Sample rate in Hz.

fft_lengthint >= 1024

Number of FFT bins, \(L\).

out_format[‘db’, ‘log-magnitude’, ‘magnitude’, ‘power’]

Output format.

q1float

A parameter used for spectral recovery.

default_f0float > 0

F0 value used when the input F0 is unvoiced.

References

[1]

M. Morise, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis”, Speech Communication, vol. 67, pp. 1-7, 2015.

forward(x, f0)[source]#

Estimate spectral envelope.

Parameters:
xTensor [shape=(…, T)]

Waveform.

f0Tensor [shape=(…, T/P)]

F0 in Hz.

Returns:
outTensor [shape=(…, T/P, L/2+1)]

Spectral envelope.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> pitch_spec = diffsptk.PitchAdaptiveSpectralAnalysis(160, 8000, 1024)
>>> sp = pitch_spec(x, f0)
>>> sp.shape
torch.Size([7, 513])

See also

spec ap pitch