mfcc

Functions

int main(int argc, char *argv[])

mfcc [ option ] [ infile ]

  • -n int

    • number of channels \((1 \le C)\)

  • -m int

    • order of coeffcients \((1 \le M)\)

  • -l int

    • FFT length \((2 \le N)\)

  • -c int

    • liftering parameter \((1 \le L)\)

  • -s double

    • sampling rate in kHz \((0 < F_s)\)

  • -L double

    • lowest frequency in Hz \((0 \le F_l < F_h)\)

  • -H double

    • highest frequency in Hz \((F_l < F_h \le 500F_s)\)

  • -q int

    • input format

      • 0 amplitude spectrum in dB

      • 1 log amplitude spectrum

      • 2 amplitude spectrum

      • 3 power spectrum

      • 4 windowed waveform

  • -o int

    • output format

      • 0 MFCC

      • 1 MFCC and energy

      • 2 MFCC and C0

      • 3 MFCC, C0, and energy

  • -e double

    • floor value of raw filter-bank output \((0 < \epsilon)\)

  • infile str

    • double-type windowed sequence or spectrum

  • stdout

    • double-type MFCCs

The below example extracts the 12-th order MFCCs from data.short. The analysis condition is that: frame length is 10 ms, frame shift is 25 ms, and sampling rate is 16 kHz. A pre-emphais filter and the hamming window are applied to the input signal.

x2x +sd data.short |
  frame -l 400 -p 160 -n 1 |
  dfs -b 1 -0.97 |
  window -l 400 -L 512 -w 1 -n 0 |
  mfcc -l 512 -n 40 -c 22 -m 12 -L 64 -H 4000 -o 1 |
  delta -m 12 -d -0.5 0.0 0.5 -d 0.25 0.0 -0.5 0.0 0.25 > data.mfcc

The corresponding HTK config file is shown as below.

SOURCEFORMAT = NOHEAD
SOURCEKIND   = WAVEFORM
SOURCERATE   = 625.0
TARGETKIND   = MFCC_E_D_A
TARGETRATE   = 100000.0
WINDOWSIZE   = 250000.0
USEHAMMING   = T
USEPOWER     = F
RAWENERGY    = F
ENORMALIZE   = F
PREEMCOEF    = 0.97
NUMCHANS     = 40
CEPLIFTER    = 22
NUMCEPS      = 12
LOFREQ       = 64
HIFREQ       = 4000
DELTAWINDOW  = 1
ACCWINDOW    = 1

The correspondence between the HTK’s configuration and the SPTK’s command option for extracting MFCC is shown in the following table.

Parameter

HTK

SPTK

Frame length

WINDOWSIZE = _

(unit is 100 ns)

frame -l _

(unit is point)

Frame shift

TARGETRATE = _

(unit is 100 ns)

frame -p _

(unit is point)

Sampling rate

SOURCERATE = _

(unit is 100 ns)

mfcc -s _

(unit is kHz)

Subtract mean

ZMEANSOURCE = T

frame -z

Pre-emphasis coefficient

PREEMCOEF = _

(windowed waveform)

dfs -b 1 -_

(entire waveform)

Window

USEHAMMING = T

window -w 1 -n 0

FFT length

N/A

(auto. calculated)

mfcc -l _

(same as input length)

Number of fbank channels

NUMCHANS = _

mfcc -n _

Lowest frequency

LOFREQ = _

mfcc -L _

Highest frequency

HIFREQ = _

mfcc -H _

Floor value of fbank

N/A

(fixed value: 1.0)

mfcc -e _

(default value: 1.0)

Order of cepstrum

NUMCEPS = _

mfcc -m _

Liftering coefficient

CEPLIFTER = _

mfcc -c _

Output energy

TARGETKIND = MFCC_E

mfcc -o 1

Output 0th coefficient

TARGETKIND = MFCC_0

mfcc -o 2

Use raw energy

RAWENERGY = T

N/A

(do not use raw)

Normalize log energy

ENORMALIZE = T

N/A

(do not normalize)

Delta window size

DELTAWINDOW = _

delta -d _

Accel window size

ACCWINDOW = _

delta -d * -d _

Parameters:
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns:

0 on success, 1 on failure.

See also

fbank plp

class MelFrequencyCepstralCoefficientsAnalysis

Perform mel-frequency cepstral coefficients (MFCC) analysis.

The input is the half part of power spectrum:

\[ \begin{array}{cccc} |X(0)|^2, & |X(1)|^2, & \ldots, & |X(N/2)|^2, \end{array} \]
where \(N\) is the FFT length. The outputs are the \(M\)-th order MFCCs with the zeroth cepstral parameter:
\[ \begin{array}{ccccc} c(0), & \bar{c}(1), & \bar{c}(2), & \ldots, & \bar{c}(M) \end{array} \]
and the log-signal energy \(E\).

The MFCCs are calculated from mel-filter-bank outputs \(\{F(j)\}_{j=1}^C\) using the discrete cosine transform

\[ c(m) = \sqrt{\frac{2}{C}} \sum_{j=1}^C F(j) \cos \left( \frac{\pi m}{C} \left(j - \frac{1}{2}\right) \right), \]
and the liftering
\[ \bar{c}(m) = \left( 1 + \frac{L}{2} \sin \frac{\pi m}{L} \right) c(m), \]
where \(L\) is the liftering parameter.

[1] S. Young et al., “The HTK book,” Cambridge University Engineering Department, 2006.

Public Functions

MelFrequencyCepstralCoefficientsAnalysis(int fft_length, int num_channel, int num_order, int liftering_coefficient, double sampling_rate, double lowest_frequency, double highest_frequency, double floor)
Parameters:
  • fft_length[in] Number of FFT bins, \(N\).

  • num_channel[in] Number of channels, \(C\).

  • num_order[in] Order of cepstral coefficients, \(M\).

  • liftering_coefficient[in] A parameter of liftering, \(L\).

  • sampling_rate[in] Sampling rate in Hz.

  • lowest_frequency[in] Lowest frequency in Hz.

  • highest_frequency[in] Highest frequency in Hz.

  • floor[in] Floor value of raw filter-bank output.

inline int GetFftLength() const
Returns:

FFT size.

inline int GetNumChannel() const
Returns:

Number of channels.

inline int GetNumOrder() const
Returns:

Order of cepstral coefficients.

inline int GetLifteringCoefficient() const
Returns:

Liftering coefficient.

inline bool IsValid() const
Returns:

True if this object is valid.

bool Run(const std::vector<double> &power_spectrum, std::vector<double> *mfcc, double *energy, MelFrequencyCepstralCoefficientsAnalysis::Buffer *buffer) const
Parameters:
  • power_spectrum[in] \((N/2+1)\)-length power spectrum.

  • mfcc[out] \(M\)-th order MFCCs.

  • energy[out] Signal energy \(E\) (optional).

  • buffer[out] Buffer.

Returns:

True on success, false on failure.

class Buffer

Buffer for MelFrequencyCepstralCoefficientsAnalysis class.