ifbank#

diffsptk.IFBANK#: alias of InverseMelFilterBankAnalysis

class diffsptk.InverseMelFilterBankAnalysis(*, n_channel: int, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False, learnable: bool = False, device: device | None = None, dtype: dtype | None = None)[source]#

This is the opposite module to :func:~diffsptk.MelFilterBankAnalysis`.

Parameters:

n_channelint >= 1: The number of mel filter banks, \(C\).
fft_lengthint >= 2: The number of FFT bins, \(L\).
sample_rateint >= 1: The sample rate in Hz.
f_minfloat >= 0: The minimum frequency in Hz.
f_maxfloat <= sample_rate // 2: The maximum frequency in Hz.
gammafloat in [-1, 1]: The parameter of the generalized logarithmic function.
scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]: The type of auditory scale used to construct the filter bank.
erb_factorfloat > 0 or None: The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.
use_powerbool: Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.
learnablebool: Whether to make the basis learnable.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

forward(y: Tensor) → Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:

yTensor [shape=(…, C)]: The mel filter bank output.

Returns:

outTensor [shape=(…, L/2+1)]: The power spectrum.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> X = stft(x)
>>> X.shape
torch.Size([2, 17])
>>> fbank = diffsptk.MelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> ifbank = diffsptk.InverseMelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> X2 = ifbank(fbank(X))
>>> X2.shape
torch.Size([2, 17])

diffsptk.functional.ifbank(y: Tensor, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False) → Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:

yTensor [shape=(…, C)]: The mel filter bank output.
fft_lengthint >= 2: The number of FFT bins, \(L\).
sample_rateint >= 1: The sample rate in Hz.
f_minfloat >= 0: The minimum frequency in Hz.
f_maxfloat <= sample_rate // 2: The maximum frequency in Hz.
gammafloat in [-1, 1]: The parameter of the generalized logarithmic function.
scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]: The type of auditory scale used to construct the filter bank.
erb_factorfloat > 0 or None: The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.
use_powerbool: Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.

Returns:

outTensor [shape=(…, L/2+1)]: The reconstructed power spectrum.