fbank

Functions

int main(int argc, char *argv[])

fbank [ option ] [ infile ]

  • -n int

    • number of channels \((1 \le C)\)

  • -l int

    • FFT length \((2 \le N)\)

  • -s double

    • sampling rate in kHz \((0 < F_s)\)

  • -L dobule

    • lowest frequency in Hz \((0.0 \le F_l < F_h)\)

  • -H dobule

    • highest frequency in Hz \((F_l < F_h \le 500F_s)\)

  • -q int

    • input format

      • 0 amplitude spectrum in dB

      • 1 log amplitude spectrum

      • 2 amplitude spectrum

      • 3 power spectrum

      • 4 windowed waveform

  • -o int

    • output format

      • 0 fbank

      • 1 fbank and energy

  • -e double

    • floor of raw filter-bank output \((0 < \epsilon)\)

  • infile str

    • double-type windowed sequence or spectrum

  • stdout

    • double-type mel-filter-bank output

The below example extracts the 20-channel mel-filter-bank outputs from a Hamming windowed signal.

frame -l 400 -p 160 < data.d | window -l 400 -L 512 -w 1 | \
   fbank -l 512 -n 20 > data.fbank

Parameters
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns

0 on success, 1 on failure.

See also

mfcc

class sptk::MelFilterBankAnalysis

Perform mel-filter-bank analysis.

The input is the half part of power spectrum:

\[ \begin{array}{cccc} |X(0)|^2, & |X(1)|^2, & \ldots, & |X(N/2)|^2, \end{array} \]
where \(N\) is the FFT length. The outputs are the \(C\)-channel mel-filter-bank outputs
\[ \begin{array}{cccc} F(1), & F(2), & \ldots, & F(C) \end{array} \]
and the log-signal energy \(E\).

The implementation is based on HTK. The only difference from the implementation is the constant of mel-scale formula:

\[ m = 1127.01048 \log \left( 1 + \frac{f}{700} \right), \]
where HTK use \(1127\) instead of \(1127.01048\).

[1] S. Young et al., “The HTK book,” Cambridge University Engineering Department, 2006.

Public Functions

MelFilterBankAnalysis(int fft_length, int num_channel, double sampling_rate, double lowest_frequency, double highest_frequency, double floor, bool use_power)
Parameters
  • fft_length[in] Number of FFT bins, \(N\).

  • num_channel[in] Number of channels, \(C\).

  • sampling_rate[in] Sampling rate in Hz.

  • lowest_frequency[in] Lowest frequency in Hz.

  • highest_frequency[in] Highest frequency in Hz.

  • floor[in] Floor value of raw filter-bank output.

  • use_power[in] If true, use power spectrum instead of amplitude one.

inline int GetFftLength() const
Returns

FFT size.

inline int GetNumChannel() const
Returns

Number of channels.

inline double GetFloor() const
Returns

Floor value.

inline bool IsPowerUsed() const
Returns

Whether to use power spectrum.

inline bool IsValid() const
Returns

True if this object is valid.

bool Run(const std::vector<double> &power_spectrum, std::vector<double> *filter_bank_output, double *energy) const
Parameters
  • power_spectrum[in] \((N/2+1)\)-length power spectrum.

  • filter_bank_output[out] \(C\)-channel filter-bank outputs.

  • energy[out] Signal energy \(E\) (optional).

Returns

True on success, false on failure.