mfcc#

diffsptk.MFCC#

alias of MelFrequencyCepstralCoefficientsAnalysis

class diffsptk.MelFrequencyCepstralCoefficientsAnalysis(*, fft_length: int, mfcc_order: int, n_channel: int, sample_rate: int, lifter: int = 1, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, out_format: str | int = 'y')[source]#

See this page for details.

Parameters:
fft_lengthint >= 2

The number of FFT bins, \(L\).

mfcc_orderint >= 1

The order of the MFCC, \(M\).

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

lifterint >= 1

The liftering coefficient.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]

y is MFCC, c is C0, and E is energy.

References

[1]
  1. Young et al., “The HTK Book,” Cambridge University Press, 2006.

forward(x: Tensor) Tensor[source]#

Compute the MFCC from the power spectrum.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

Returns:
yTensor [shape=(…, M)]

The MFCC without C0.

ETensor [shape=(…, 1)] (optional)

The energy.

cTensor [shape=(…, 1)] (optional)

The C0.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> mfcc = diffsptk.MFCC(
...     fft_length=32, mfcc_order=4, n_channel=8, sample_rate=8000
... )
>>> y = mfcc(stft(x))
>>> y
tensor([[-7.7745e-03, -1.4447e-02,  1.6157e-02,  1.1069e-03],
        [ 2.8049e+00, -1.6257e+00, -2.3566e-02,  1.2804e-01]])
diffsptk.functional.mfcc(x: Tensor, mfcc_order: int, n_channel: int, sample_rate: int, lifter: int = 1, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, out_format: str = 'y') Tensor[source]#

Compute the MFCC from the power spectrum.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

mfcc_orderint >= 1

The order of the MFCC, \(M\).

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

lifterint >= 1

The liftering coefficient.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]

y is MFCC, c is C0, and E is energy.

Returns:
yTensor [shape=(…, M)]

The MFCC without C0.

ETensor [shape=(…, 1)] (optional)

The energy.

cTensor [shape=(…, 1)] (optional)

The C0.

See also

stft fbank plp