ifbank#

diffsptk.IFBANK#

alias of InverseMelFilterBankAnalysis

class diffsptk.InverseMelFilterBankAnalysis(*, n_channel: int, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False, learnable: bool = False, device: device | None = None, dtype: dtype | None = None)[source]#

This is the opposite module to :func:~diffsptk.MelFilterBankAnalysis`.

Parameters:
n_channelint >= 1

The number of mel filter banks, \(C\).

fft_lengthint >= 2

The number of FFT bins, \(L\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]

The type of auditory scale used to construct the filter bank.

erb_factorfloat > 0 or None

The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.

use_powerbool

Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.

learnablebool

Whether to make the basis learnable.

devicetorch.device or None

The device of this module.

dtypetorch.dtype or None

The data type of this module.

forward(y: Tensor) Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:
yTensor [shape=(…, C)]

The mel filter bank output.

Returns:
outTensor [shape=(…, L/2+1)]

The power spectrum.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> X = stft(x)
>>> X.shape
torch.Size([2, 17])
>>> fbank = diffsptk.MelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> ifbank = diffsptk.InverseMelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> X2 = ifbank(fbank(X))
>>> X2.shape
torch.Size([2, 17])
diffsptk.functional.ifbank(y: Tensor, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, scale: str = 'htk', erb_factor: float | None = None, use_power: bool = False) Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:
yTensor [shape=(…, C)]

The mel filter bank output.

fft_lengthint >= 2

The number of FFT bins, \(L\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

scale[‘htk’, ‘mel’, ‘inverted-mel’, ‘bark’, ‘linear’]

The type of auditory scale used to construct the filter bank.

erb_factorfloat > 0 or None

The scale factor for the ERB scale, referred to as the E-factor. If not None, the filter bandwidths are adjusted according to the scaled ERB scale.

use_powerbool

Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.

Returns:
outTensor [shape=(…, L/2+1)]

The reconstructed power spectrum.

See also

fbank griffin