fbank

Functions

int main(int argc, char *argv[])

fbank [ option ] [ infile ]

  • -n int

    • number of channels \((1 \le C)\)

  • -l int

    • FFT length \((2 \le N)\)

  • -s double

    • sampling rate in kHz \((0 < F_s)\)

  • -L double

    • lowest frequency in Hz \((0 \le F_l < F_h)\)

  • -H double

    • highest frequency in Hz \((F_l < F_h \le 500F_s)\)

  • -q int

    • input format

      • 0 amplitude spectrum in dB

      • 1 log amplitude spectrum

      • 2 amplitude spectrum

      • 3 power spectrum

      • 4 windowed waveform

  • -o int

    • output format

      • 0 fbank

      • 1 fbank and energy

  • -e double

    • floor of raw filter-bank output \((0 < \epsilon)\)

  • infile str

    • double-type windowed sequence or spectrum

  • stdout

    • double-type mel-filter-bank output

The below example extracts the 20-channel mel-filter-bank outputs from a Hamming windowed signal.

frame -l 400 -p 160 < data.d | window -l 400 -L 512 -w 1 | \
   fbank -l 512 -n 20 > data.fbank
Parameters:
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns:

0 on success, 1 on failure.

See also

mfcc

class MelFilterBankAnalysis

Perform mel-filter-bank analysis.

The input is the half part of power spectrum:

\[ \begin{array}{cccc} |X(0)|^2, & |X(1)|^2, & \ldots, & |X(N/2)|^2, \end{array} \]
where \(N\) is the FFT length. The outputs are the \(C\)-channel mel-filter-bank outputs
\[ \begin{array}{cccc} F(1), & F(2), & \ldots, & F(C) \end{array} \]
and the log-signal energy \(E\).

The implementation is based on HTK. The only difference from the implementation is the constant of mel-scale formula:

\[ m = 1127.01048 \log \left( 1 + \frac{f}{700} \right), \]
where HTK use \(1127\) instead of \(1127.01048\).

[1] S. Young et al., “The HTK book,” Cambridge University Engineering Department, 2006.

Public Functions

MelFilterBankAnalysis(int fft_length, int num_channel, double sampling_rate, double lowest_frequency, double highest_frequency, double floor, bool use_power)
Parameters:
  • fft_length[in] Number of FFT bins, \(N\).

  • num_channel[in] Number of channels, \(C\).

  • sampling_rate[in] Sampling rate in Hz.

  • lowest_frequency[in] Lowest frequency in Hz.

  • highest_frequency[in] Highest frequency in Hz.

  • floor[in] Floor value of raw filter-bank output.

  • use_power[in] If true, use power spectrum instead of amplitude one.

inline int GetFftLength() const
Returns:

FFT size.

inline int GetNumChannel() const
Returns:

Number of channels.

inline double GetFloor() const
Returns:

Floor value.

inline bool IsPowerUsed() const
Returns:

Whether to use power spectrum.

inline bool IsValid() const
Returns:

True if this object is valid.

bool GetCenterFrequencies(std::vector<double> *center_frequencies) const
Returns:

Center frequencies in Hz.

bool Run(const std::vector<double> &power_spectrum, std::vector<double> *filter_bank_output, double *energy) const
Parameters:
  • power_spectrum[in] \((N/2+1)\)-length power spectrum.

  • filter_bank_output[out] \(C\)-channel filter-bank outputs.

  • energy[out] Signal energy \(E\) (optional).

Returns:

True on success, false on failure.