ifbank#
- diffsptk.IFBANK#
alias of
InverseMelFilterBankAnalysis
- class diffsptk.InverseMelFilterBankAnalysis(*, n_channel: int, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, use_power: bool = False, learnable: bool = False)[source]#
This is the opposite module to :func:~diffsptk.MelFilterBankAnalysis`.
- Parameters:
- n_channelint >= 1
The number of mel filter banks, \(C\).
- fft_lengthint >= 2
The number of FFT bins, \(L\).
- sample_rateint >= 1
The sample rate in Hz.
- f_minfloat >= 0
The minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2
The maximum frequency in Hz.
- gammafloat in [-1, 1]
The parameter of the generalized logarithmic function.
- use_powerbool
Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.
- learnablebool
Whether to make the basis learnable.
- forward(y: Tensor) Tensor [source]#
Reconstruct the power spectrum from the mel filter bank output.
- Parameters:
- yTensor [shape=(…, C)]
The mel filter bank output.
- Returns:
- outTensor [shape=(…, L/2+1)]
The power spectrum.
Examples
>>> x = diffsptk.ramp(19) >>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32) >>> X = stft(x) >>> X.shape torch.Size([2, 17]) >>> fbank = diffsptk.MelFilterBankAnalysis( ... fft_length=32, n_channel=4, sample_rate=8000 ... ) >>> ifbank = diffsptk.InverseMelFilterBankAnalysis( ... fft_length=32, n_channel=4, sample_rate=8000 ... ) >>> X2 = ifbank(fbank(X)) >>> X2.shape torch.Size([2, 17])
- diffsptk.functional.ifbank(y: Tensor, fft_length: int, sample_rate: int, f_min: float = 0, f_max: float | None = None, gamma: float = 0, use_power: bool = False) Tensor [source]#
Reconstruct the power spectrum from the mel filter bank output.
- Parameters:
- yTensor [shape=(…, C)]
The mel filter bank output.
- fft_lengthint >= 2
The number of FFT bins, \(L\).
- sample_rateint >= 1
The sample rate in Hz.
- f_minfloat >= 0
The minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2
The maximum frequency in Hz.
- gammafloat in [-1, 1]
The parameter of the generalized logarithmic function.
- use_powerbool
Set to True if the mel filter bank output is extracted from the power spectrum instead of the amplitude spectrum.
- Returns:
- outTensor [shape=(…, L/2+1)]
The reconstructed power spectrum.