smcep#

class diffsptk.SecondOrderAllPassMelCepstralAnalysis(*, fft_length: int, cep_order: int, alpha: float = 0, theta: float = 0, n_iter: int = 0, accuracy_factor: int = 4, device: device | None = None, dtype: dtype | None = None)[source]#

See this page for details. Note that the current implementation does not use the efficient Toeplitz-plus-Hankel system solver.

Parameters:

fft_lengthint >= 2M: The number of FFT bins, \(L\).
cep_orderint >= 0: The order of the cepstrum, \(M\).
alphafloat in (-1, 1): The frequency warping factor, \(\alpha\).
thetafloat in [0, 1]: The emphasis frequency, \(\theta\).
n_iterint >= 0: The number of iterations.
accuracy_factorint >= 1: The accuracy factor multiplied by the FFT length.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

References

[1]

T. Wakako et al., “Speech spectral estimation based on expansion of log spectrum by arbitrary basis functions,” IEICE Trans, vol. J82-D-II, no. 12, pp. 2203-2211, 1999 (in Japanese).

forward(x: Tensor) → Tensor[source]#

Perform mel-cepstral analysis based on the second-order all-pass filter.

Parameters:

xTensor [shape=(…, L/2+1)]: The power spectrum.

Returns:

outTensor [shape=(…, M+1)]: The mel-cepstrum.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=16)
>>> smcep = diffsptk.SecondOrderAllPassMelCepstralAnalysis(
...     fft_length=16, cep_order=3, alpha=0.1, n_iter=1
... )
>>> mc = smcep(stft(x))
>>> mc
tensor([[-0.8851,  0.7917, -0.1737,  0.0175],
        [-0.3523,  4.4223, -1.0883, -0.0510]])

diffsptk.functional.smcep(x: Tensor, cep_order: int, alpha: float = 0, theta: float = 0, n_iter: int = 0, accuracy_factor: int = 4) → Tensor[source]#

Perform mel-cepstral analysis.

Parameters:

xTensor [shape=(…, L/2+1)]: The power spectrum.
cep_orderint >= 0: The order of the cepstrum, \(M\).
alphafloat in (-1, 1): The frequency warping factor, \(\alpha\).
thetafloat in [0, 1]: The emphasis frequency, \(\theta\).
n_iterint >= 0: The number of iterations.
accuracy_factorint >= 1: The accuracy factor multiplied by the FFT length.

Returns:

outTensor [shape=(…, M+1)]: The mel-cepstrum.