mfcc

Functions

int main(int argc, char *argv[])

mfcc [ option ] [ infile ]

  • -n int

    • number of channels \((1 \le C)\)

  • -m int

    • order of coeffcients \((1 \le M)\)

  • -l int

    • FFT length \((2 \le N)\)

  • -c int

    • liftering parameter \((1 \le L)\)

  • -s double

    • sampling rate in kHz \((0 < F_s)\)

  • -L dobule

    • lowest frequency in Hz \((0.0 \le F_l < F_h)\)

  • -H dobule

    • highest frequency in Hz \((F_l < F_h \le 500F_s)\)

  • -q int

    • input format

      • 0 amplitude spectrum in dB

      • 1 log amplitude spectrum

      • 2 amplitude spectrum

      • 3 power spectrum

      • 4 windowed waveform

  • -o int

    • output format

      • 0 MFCC

      • 1 MFCC and energy

      • 2 MFCC and C0

      • 3 MFCC, C0, and energy

  • -e double

    • floor value of raw filter-bank output \((0 < \epsilon)\)

  • infile str

    • double-type windowed sequence or spectrum

  • stdout

    • double-type MFCCs

The below example extracts the 12-th order MFCCs from data.short. The analysis condition is that: frame length is 10 ms, frame shift is 25 ms, and sampling rate is 16 kHz. A pre-emphais filter and the hamming window are applied to the input signal.

x2x +sd data.short |
  frame -l 400 -p 160 -n |
  dfs -b 1 -0.97 |
  window -l 400 -L 512 -w 1 -n 0 |
  mfcc -l 512 -n 40 -c 22 -m 12 -L 64 -H 4000 -o 1 |
  delta -m 12 -d -0.5 0.0 0.5 -d 0.25 0.0 -0.5 0.0 0.25 > data.mfcc

The corresponding HTK config file is shown as below.

SOURCEFORMAT = NOHEAD
SOURCEKIND   = WAVEFORM
SOURCERATE   = 625.0
TARGETKIND   = MFCC_E_D_A
TARGETRATE   = 100000.0
WINDOWSIZE   = 250000.0
USEHAMMING   = T
RAWENERGY    = F
ENORMALIZE   = F
PREEMCOEF    = 0.97
NUMCHANS     = 40
CEPLIFTER    = 22
NUMCEPS      = 12
LOFREQ       = 64
HIFREQ       = 4000
DELTAWINDOW  = 1
ACCWINDOW    = 1

The correspondence between the HTK’s configuration and the SPTK’s command option for extracting MFCC is shown in the following table.

Parameter

HTK

SPTK

Frame length

WINDOWSIZE = _

(unit is 100 ns)

frame -l _

(unit is point)

Frame shift

TARGETRATE = _

(unit is 100 ns)

frame -p _

(unit is point)

Sampling rate

SOURCERATE = _

(unit is 100 ns)

mfcc -s _

(unit is kHz)

Subtract mean

ZMEANSOURCE = T

frame -z

Pre-emphasis coefficient

PREEMCOEF = _

(windowed waveform)

dfs -b 1 -_

(entire waveform)

Window

USEHAMMING = T

window -w 1 -n 0

FFT length

N/A

(auto. calculated)

mfcc -l _

(same as input length)

Number of fbank channels

NUMCHANS = _

mfcc -n _

Lowest frequency

LOFREQ = _

mfcc -L _

Highest frequency

HIFREQ = _

mfcc -H _

Floor value of fbank

N/A

(fixed value: 1.0)

mfcc -e _

(default value: 1.0)

Order of cepstrum

NUMCEPS = _

mfcc -m _

Liftering coefficient

CEPLIFTER = _

mfcc -c _

Output energy

TARGETKIND = MFCC_E

mfcc -o 1

Output 0th coefficient

TARGETKIND = MFCC_0

mfcc -o 2

Use raw energy

RAWENERGY = T

N/A

(do not use raw)

Normalize log energy

ENORMALIZE = T

N/A

(do not normalize)

Delta window size

DELTAWINDOW = _

delta -d _

Accel window size

ACCWINDOW = _

delta -d * -d _

Parameters
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns

0 on success, 1 on failure.

See also

fbank

class sptk::MelFrequencyCepstralCoefficientsAnalysis

Perform mel-frequency cepstral coefficients (MFCC) analysis.

The input is the half part of power spectrum:

\[ \begin{array}{cccc} |X(0)|^2, & |X(1)|^2, & \ldots, & |X(N/2)|^2, \end{array} \]
where \(N\) is the FFT length. The outputs are the \(M\)-th order MFCCs with the zeroth cepstral parameter:
\[ \begin{array}{ccccc} c(0), & \bar{c}(1), & \bar{c}(2), & \ldots, & \bar{c}(M) \end{array} \]
and the log-signal energy \(E\).

The MFCCs are calculated from mel-filter-bank outputs \(\{F(j)\}_{j=1}^C\) using the discrete cosine transform

\[ c(m) = \sqrt{\frac{2}{C}} \sum_{j=1}^C F(j) \cos \left( \frac{\pi m}{C} \left(j - \frac{1}{2}\right) \right), \]
and the liftering
\[ \bar{c}(m) = \left( 1 + \frac{L}{2} \sin \frac{\pi m}{L} \right) c(m), \]
where \(L\) is the liftering parameter.

[1] S. Young et al., “The HTK book,” Cambridge University Engineering Department, 2006.

Public Functions

MelFrequencyCepstralCoefficientsAnalysis(int fft_length, int num_channel, int num_order, int liftering_coefficient, double sampling_rate, double lowest_frequency, double highest_frequency, double floor)
Parameters
  • fft_length[in] Number of FFT bins, \(N\).

  • num_channel[in] Number of channels, \(C\).

  • num_order[in] Order of cepstral coefficients, \(M\).

  • liftering_coefficient[in] A parameter of liftering, \(L\).

  • sampling_rate[in] Sampling rate in Hz.

  • lowest_frequency[in] Lowest frequency in Hz.

  • highest_frequency[in] Highest frequency in Hz.

  • floor[in] Floor value of raw filter-bank output.

inline int GetFftLength() const
Returns

FFT size.

inline int GetNumChannel() const
Returns

Number of channels.

inline int GetNumOrder() const
Returns

Order of cepstral coefficients.

inline int GetLifteringCoefficient() const
Returns

Liftering coefficient.

inline bool IsValid() const
Returns

True if this object is valid.

bool Run(const std::vector<double> &power_spectrum, std::vector<double> *mfcc, double *energy, MelFrequencyCepstralCoefficientsAnalysis::Buffer *buffer) const
Parameters
  • power_spectrum[in] \((N/2+1)\)-length power spectrum.

  • mfcc[out] \(M\)-th order MFCCs.

  • energy[out] Signal energy \(E\) (optional).

  • buffer[out] Buffer.

Returns

True on success, false on failure.

class Buffer

Buffer for MelFrequencyCepstralCoefficientsAnalysis class.