fbank#

diffsptk.FBANK#

alias of MelFilterBankAnalysis

class diffsptk.MelFilterBankAnalysis(*, fft_length: int, n_channel: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, use_power: bool = False, out_format: str | int = 'y')[source]#

See this page for details.

Parameters:
fft_lengthint >= 2

The number of FFT bins, \(L\).

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

use_powerbool

If True, use the power spectrum instead of the amplitude spectrum.

out_format[‘y’, ‘yE’, ‘y,E’]

y is mel filber bank output and E is energy. If this is yE, the two output tensors are concatenated and return the tensor instead of the tuple.

References

[1]
  1. Young et al., “The HTK Book,” Cambridge University Press, 2006.

forward(x: Tensor) Tensor | tuple[Tensor, Tensor][source]#

Apply mel filter banks to the STFT.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

Returns:
yTensor [shape=(…, C)]

The mel filter bank output.

ETensor [shape=(…, 1)] (optional)

The energy.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> fbank = diffsptk.MelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> y = fbank(stft(x))
>>> y
tensor([[0.1214, 0.4825, 0.6072, 0.3589],
        [3.3640, 3.4518, 2.7717, 0.5088]])
diffsptk.functional.fbank(x: Tensor, n_channel: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, use_power: bool = False, out_format: str = 'y') tuple[Tensor, Tensor] | Tensor[source]#

Apply mel-filter banks to the STFT.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

use_powerbool

If True, use the power spectrum instead of the amplitude spectrum.

out_format[‘y’, ‘yE’, ‘y,E’]

y is mel filber bank output and E is energy. If this is yE, the two output tensors are concatenated and return the tensor instead of the tuple.

Returns:
yTensor [shape=(…, C)]

The mel filter bank output.

ETensor [shape=(…, 1)] (optional)

The energy.

See also

stft mfcc plp