gmm#

diffsptk.GMM#: alias of GaussianMixtureModeling

class diffsptk.GaussianMixtureModeling(order: int, n_mixture: int, *, n_iter: int = 100, eps: float = 1e-05, weight_floor: float = 1e-05, var_floor: float = 1e-06, var_type: str = 'diag', block_size: list[int] | tuple[int, ...] | ndarray | None = None, ubm: tuple[Tensor, Tensor, Tensor] | None = None, alpha: float = 0, batch_size: int | None = None, verbose: bool | int = False, device: device | None = None, dtype: dtype | None = None)[source]#

See this page for details. Note that the forward method is not differentiable.

Parameters:

orderint >= 0: The order of the vector, \(M\).
n_mixtureint >= 1: The number of mixture components, \(K\).
n_iterint >= 1: The number of iterations.
epsfloat >= 0: The convergence threshold.
weight_floorfloat >= 0: The floor value for mixture weights.
var_floorfloat >= 0: The floor value for variance.
var_type[‘diag’, ‘full’]: The type of covariance matrix.
block_sizelist[int]: The block size of covariance matrix.
ubmtuple of Tensors [shape=((K,), (K, M+1), (K, M+1, M+1))]: The GMM parameters of a universal background model.
alphafloat in [0, 1]: The smoothing parameter.
batch_sizeint >= 1 or None: The batch size.
verbosebool: If 1, shows the likelihood at each iteration; if 2, shows progress bars.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

References

[1]

J-L. Gauvain et al., “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.

forward(x: Tensor | DataLoader, return_posterior: bool = False) → tuple[tuple[Tensor, Tensor, Tensor], Tensor] | tuple[tuple[Tensor, Tensor, Tensor], Tensor, Tensor][source]#

Train Gaussian mixture models.

Parameters:

xTensor [shape=(T, M+1)] or DataLoader: The input vectors or a DataLoader that yields the input vectors.
return_posteriorbool: If True, return the posterior probabilities.

Returns:

paramstuple of Tensors [shape=((K,), (K, M+1), (K, M+1, M+1))]: The estimated GMM parameters.
posteriorTensor [shape=(T, K)] (optional): The posterior probabilities.
log_likelihoodTensor [scalar]: The total log-likelihood.

Examples

>>> x = diffsptk.nrand(10, 1)
>>> gmm = diffsptk.GMM(1, 2)
>>> params, log_likelihood = gmm(x)
>>> w, mu, sigma = params
>>> w
tensor([0.1917, 0.8083])
>>> mu
tensor([[ 1.2321,  0.2058],
        [-0.1326, -0.7006]])
>>> sigma
tensor([[[3.4010e-01, 0.0000e+00],
         [0.0000e+00, 6.2351e-04]],
        [[3.0944e-01, 0.0000e+00],
         [0.0000e+00, 8.6096e-01]]])
>>> log_likelihood
tensor(-19.5235)

set_params(params: tuple[Tensor | None, Tensor | None, Tensor | None]) → None[source]#

Set model parameters.

Parameters:

paramstuple of Tensors [shape=((K,), (K, M+1), (K, M+1, M+1))]: The GMM parameters.

transform(x: Tensor) → tuple[Tensor | None, Tensor, Tensor][source]#

Transform the input vectors based on a single mixture sequence.

Parameters:

xTensor [shape=(T, N+1)]: The input vectors.

Returns:

yTensor [shape=(T, M-N)]: The output vectors.
indicesTensor [shape=(T,)]: The selected mixture indices.
log_probTensor [shape=(T,)]: The log probabilities.

warmup(x: Tensor | DataLoader, **lbg_params) → None[source]#

Initialize the model parameters by K-means clustering.

Parameters:

xTensor [shape=(T, M+1)] or DataLoader: The training data.
lbg_paramsadditional keyword arguments: The parameters for the Linde-Buzo-Gray algorithm.