mfcc
Functions
-
int main(int argc, char *argv[])
mfcc [ option ] [ infile ]
-n int
number of channels \((1 \le C)\)
-m int
order of coeffcients \((1 \le M)\)
-l int
FFT length \((2 \le N)\)
-c int
liftering parameter \((1 \le L)\)
-s double
sampling rate in kHz \((0 < F_s)\)
-L double
lowest frequency in Hz \((0 \le F_l < F_h)\)
-H double
highest frequency in Hz \((F_l < F_h \le 500F_s)\)
-q int
input format
0
amplitude spectrum in dB1
log amplitude spectrum2
amplitude spectrum3
power spectrum4
windowed waveform
-o int
output format
0
MFCC1
MFCC and energy2
MFCC and C03
MFCC, C0, and energy
-e double
floor value of raw filter-bank output \((0 < \epsilon)\)
infile str
double-type windowed sequence or spectrum
stdout
double-type MFCCs
The below example extracts the 12-th order MFCCs from
data.short
. The analysis condition is that: frame length is 10 ms, frame shift is 25 ms, and sampling rate is 16 kHz. A pre-emphais filter and the hamming window are applied to the input signal.x2x +sd data.short | frame -l 400 -p 160 -n 1 | dfs -b 1 -0.97 | window -l 400 -L 512 -w 1 -n 0 | mfcc -l 512 -n 40 -c 22 -m 12 -L 64 -H 4000 -o 1 | delta -m 12 -d -0.5 0.0 0.5 -d 0.25 0.0 -0.5 0.0 0.25 > data.mfcc
The corresponding HTK config file is shown as below.
SOURCEFORMAT = NOHEAD SOURCEKIND = WAVEFORM SOURCERATE = 625.0 TARGETKIND = MFCC_E_D_A TARGETRATE = 100000.0 WINDOWSIZE = 250000.0 USEHAMMING = T USEPOWER = F RAWENERGY = F ENORMALIZE = F PREEMCOEF = 0.97 NUMCHANS = 40 CEPLIFTER = 22 NUMCEPS = 12 LOFREQ = 64 HIFREQ = 4000 DELTAWINDOW = 1 ACCWINDOW = 1
The correspondence between the HTK’s configuration and the SPTK’s command option for extracting MFCC is shown in the following table.
Parameter
HTK
SPTK
Frame length
WINDOWSIZE = _
(unit is 100 ns)
frame -l _
(unit is point)
Frame shift
TARGETRATE = _
(unit is 100 ns)
frame -p _
(unit is point)
Sampling rate
SOURCERATE = _
(unit is 100 ns)
mfcc -s _
(unit is kHz)
Subtract mean
ZMEANSOURCE = T
frame -z
Pre-emphasis coefficient
PREEMCOEF = _
(windowed waveform)
dfs -b 1 -_
(entire waveform)
Window
USEHAMMING = T
window -w 1 -n 0
FFT length
N/A
(auto. calculated)
mfcc -l _
(same as input length)
Number of fbank channels
NUMCHANS = _
mfcc -n _
Lowest frequency
LOFREQ = _
mfcc -L _
Highest frequency
HIFREQ = _
mfcc -H _
Floor value of fbank
N/A
(fixed value: 1.0)
mfcc -e _
(default value: 1.0)
Order of cepstrum
NUMCEPS = _
mfcc -m _
Liftering coefficient
CEPLIFTER = _
mfcc -c _
Output energy
TARGETKIND = MFCC_E
mfcc -o 1
Output 0th coefficient
TARGETKIND = MFCC_0
mfcc -o 2
Use raw energy
RAWENERGY = T
N/A
(do not use raw)
Normalize log energy
ENORMALIZE = T
N/A
(do not normalize)
Delta window size
DELTAWINDOW = _
delta -d _
Accel window size
ACCWINDOW = _
delta -d * -d _
- Parameters:
argc – [in] Number of arguments.
argv – [in] Argument vector.
- Returns:
0 on success, 1 on failure.
-
class MelFrequencyCepstralCoefficientsAnalysis
Perform mel-frequency cepstral coefficients (MFCC) analysis.
The input is the half part of power spectrum:
\[ \begin{array}{cccc} |X(0)|^2, & |X(1)|^2, & \ldots, & |X(N/2)|^2, \end{array} \]where \(N\) is the FFT length. The outputs are the \(M\)-th order MFCCs with the zeroth cepstral parameter:\[ \begin{array}{ccccc} c(0), & \bar{c}(1), & \bar{c}(2), & \ldots, & \bar{c}(M) \end{array} \]and the log-signal energy \(E\).The MFCCs are calculated from mel-filter-bank outputs \(\{F(j)\}_{j=1}^C\) using the discrete cosine transform
\[ c(m) = \sqrt{\frac{2}{C}} \sum_{j=1}^C F(j) \cos \left( \frac{\pi m}{C} \left(j - \frac{1}{2}\right) \right), \]and the liftering\[ \bar{c}(m) = \left( 1 + \frac{L}{2} \sin \frac{\pi m}{L} \right) c(m), \]where \(L\) is the liftering parameter.[1] S. Young et al., “The HTK book,” Cambridge University Engineering Department, 2006.
Public Functions
-
MelFrequencyCepstralCoefficientsAnalysis(int fft_length, int num_channel, int num_order, int liftering_coefficient, double sampling_rate, double lowest_frequency, double highest_frequency, double floor)
- Parameters:
fft_length – [in] Number of FFT bins, \(N\).
num_channel – [in] Number of channels, \(C\).
num_order – [in] Order of cepstral coefficients, \(M\).
liftering_coefficient – [in] A parameter of liftering, \(L\).
sampling_rate – [in] Sampling rate in Hz.
lowest_frequency – [in] Lowest frequency in Hz.
highest_frequency – [in] Highest frequency in Hz.
floor – [in] Floor value of raw filter-bank output.
-
inline int GetFftLength() const
- Returns:
FFT size.
-
inline int GetNumChannel() const
- Returns:
Number of channels.
-
inline int GetNumOrder() const
- Returns:
Order of cepstral coefficients.
-
inline int GetLifteringCoefficient() const
- Returns:
Liftering coefficient.
-
inline bool IsValid() const
- Returns:
True if this object is valid.
-
bool Run(const std::vector<double> &power_spectrum, std::vector<double> *mfcc, double *energy, MelFrequencyCepstralCoefficientsAnalysis::Buffer *buffer) const
- Parameters:
power_spectrum – [in] \((N/2+1)\)-length power spectrum.
mfcc – [out] \(M\)-th order MFCCs.
energy – [out] Signal energy \(E\) (optional).
buffer – [out] Buffer.
- Returns:
True on success, false on failure.
-
class Buffer
Buffer for MelFrequencyCepstralCoefficientsAnalysis class.
-
MelFrequencyCepstralCoefficientsAnalysis(int fft_length, int num_channel, int num_order, int liftering_coefficient, double sampling_rate, double lowest_frequency, double highest_frequency, double floor)