ifbank#

diffsptk.IFBANK#

alias of InverseMelFilterBankAnalysis

class diffsptk.InverseMelFilterBankAnalysis(*, n_channel: int, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, use_power: bool = False, learnable: bool = False)[source]#

This is the opposite module to :func:~diffsptk.MelFilterBankAnalysis`.

Parameters:
n_channelint >= 1

The number of mel filter banks, \(C\).

fft_lengthint >= 2

The number of FFT bins, \(L\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

use_powerbool

Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.

learnablebool

Whether to make the basis learnable.

forward(y: Tensor) Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:
yTensor [shape=(…, C)]

The mel filter bank output.

Returns:
outTensor [shape=(…, L/2+1)]

The power spectrum.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> X = stft(x)
>>> X.shape
torch.Size([2, 17])
>>> fbank = diffsptk.MelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> ifbank = diffsptk.InverseMelFilterBankAnalysis(
...     fft_length=32, n_channel=4, sample_rate=8000
... )
>>> X2 = ifbank(fbank(X))
>>> X2.shape
torch.Size([2, 17])
diffsptk.functional.ifbank(y: Tensor, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, use_power: bool = False) Tensor[source]#

Reconstruct the power spectrum from the mel filter bank output.

Parameters:
yTensor [shape=(…, C)]

The mel filter bank output.

fft_lengthint >= 2

The number of FFT bins, \(L\).

sample_rateint >= 1

The sample rate in Hz.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

gammafloat in [-1, 1]

The parameter of the generalized logarithmic function.

use_powerbool

Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.

Returns:
outTensor [shape=(…, L/2+1)]

The reconstructed power spectrum.

See also

fbank griffin