mglsadf#

diffsptk.MLSA#: alias of PseudoMGLSADigitalFilter

class diffsptk.PseudoMGLSADigitalFilter(filter_order: tuple[int, int] | int, frame_period: int, *, alpha: float = 0, gamma: float = 0, c: int | None = None, ignore_gain: bool = False, phase: str = 'minimum', mode: str = 'multi-stage', **kwargs)[source]#

See this page for details.

Parameters:

filter_orderint >= 0 or tuple[int, int]: The order of the filter coefficients, \(M\) or \((N, M)\). A tuple input is allowed only if phase is ‘mixed’.
frame_periodint >= 1: The frame period in samples, \(P\).
alphafloat in (-1, 1): The frequency warping factor, \(\alpha\).
gammafloat in [-1, 1]: The gamma parameter, \(\gamma\).
cint >= 1 or None: The number of stages.
ignore_gainbool: If True, filtering is performed without gain.
phase[‘minimum’, ‘maximum’, ‘zero’, ‘mixed’]: The filter type.
mode[‘multi-stage’, ‘single-stage’, ‘freq-domain’]: ‘multi-stage’ approximates the MLSA filter by cascading FIR filters based on the Taylor series expansion. ‘single-stage’ uses an FIR filter with the coefficients derived from the impulse response converted from the input mel-cepstral coefficients using FFT. ‘freq-domain’ performs filtering in the frequency domain rather than the time domain.
n_fftint >= 1: The number of FFT bins used for conversion. Higher values result in increased conversion accuracy.
taylor_orderint >= 0: The order of the Taylor series expansion (valid only if mode is ‘multi-stage’).
cep_orderint >= 0 or tuple[int, int]: The order of the linear cepstrum (valid only if mode is ‘multi-stage’).
ir_lengthint >= 1 or tuple[int, int]: The length of the impulse response (valid only if mode is ‘single-stage’).
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.
**kwargsadditional keyword arguments: See ShortTimeFourierTransform() (valid only if mode is ‘freq-domain’).

References

[1]

T. Yoshimura et al., “Embedding a differentiable mel-cepstral synthesis filter to a neural speech synthesis system,” Proceedings of ICASSP, 2023.

forward(x: Tensor, mc: Tensor) → Tensor[source]#

Apply an MGLSA digital filter.

Parameters:

xTensor [shape=(…, T)]: The excitation signal.
mcTensor [shape=(…, T/P, M+1)] or [shape=(…, T/P, N+M+1)]: The mel-generalized cepstrum, not MLSA digital filter coefficients. Note that the mixed-phase case assumes that the coefficients are of the form c_{-N}, …, c_{0}, …, c_{M}, where M is the order of the minimum-phase part and N is the order of the maximum-phase part.

Returns:

outTensor [shape=(…, T)]: The output signal.

Examples

>>> M = 4
>>> x = diffsptk.step(3)
>>> mc = diffsptk.nrand(2, M)
>>> mc
tensor([[-0.9134, -0.5774, -0.4567,  0.7423, -0.5782],
        [ 0.6904,  0.5175,  0.8765,  0.1677,  2.4624]])
>>> mglsadf = diffsptk.MLSA(M, frame_period=2)
>>> y = mglsadf(x.view(1, -1), mc.view(1, 2, M + 1))
>>> y
tensor([[0.4011, 0.8760, 3.5677, 4.8725]])