fbank#

diffsptk.FBANK#

alias of MelFilterBankAnalysis

class diffsptk.MelFilterBankAnalysis(*, fft_length: int, n_channel: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False, out_format: str | int = 'y', learnable: bool = False, device: device | None = None, dtype: dtype | None = None)[source]#

See this page for details.

Parameters:
fft_lengthint >= 2

The number of FFT bins, \(L\).

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]

The type of auditory scale used to construct the filter bank.

erb_factorfloat > 0 or None

The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.

use_powerbool

If True, use the power spectrum instead of the amplitude spectrum.

out_format[‘y’, ‘yE’, ‘y,E’]

y is mel filber bank output and E is energy. If this is yE, the two output tensors are concatenated and return the tensor instead of the tuple.

learnablebool

Whether to make the basis learnable.

devicetorch.device or None

The device of this module.

dtypetorch.dtype or None

The data type of this module.

References

[1]

S. Young et al., “The HTK Book Version 3.4,” Cambridge University Press, 2006.

[2]

T. Ganchev et al., “Comparative evaluation of various MFCC implementations on the speaker verification task,” Proceedings of SPECOM, vol. 1, pp. 191-194, 2005.

[3]

M. D. Skowronski et al., “Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition,” The Journal of the Acoustical Society of America, vol. 116, no. 3, pp. 1774-1780, 2004.

forward(x: Tensor) Tensor | tuple[Tensor, Tensor][source]#

Apply mel filter banks to the STFT.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

Returns:
yTensor [shape=(…, C)]

The mel filter bank output.

ETensor [shape=(…, 1)] (optional)

The energy.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> fbank = diffsptk.MelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> y = fbank(stft(x))
>>> y
tensor([[0.1214, 0.4825, 0.6072, 0.3589],
        [3.3640, 3.4518, 2.7717, 0.5088]])
diffsptk.functional.fbank(x: Tensor, n_channel: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False, out_format: str = 'y') tuple[Tensor, Tensor] | Tensor[source]#

Apply mel-filter banks to the STFT.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]

The type of auditory scale used to construct the filter bank.

erb_factorfloat > 0 or None

The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.

use_powerbool

If True, use the power spectrum instead of the amplitude spectrum.

out_format[‘y’, ‘yE’, ‘y,E’]

y is mel filber bank output and E is energy. If this is yE, the two output tensors are concatenated and return the tensor instead of the tuple.

Returns:
yTensor [shape=(…, C)]

The mel filter bank output.

ETensor [shape=(…, 1)] (optional)

The energy.

See also

ifbank stft mfcc plp