vc

Functions

int main(int argc, char *argv[])

vc [ option ] gmmfile [ infile ]

-l int
- length of source vector \((1 \le M_1 + 1)\)
-m int
- order of source vector \((0 \le M_1)\)
-L int
- length of target vector \((1 \le M_2 + 1)\)
-M int
- order of target vector \((0 \le M_2)\)
-k int
- number of mixtures \((1 \le K)\)
-f
- use full or block covariance instead of diagonal one
-d double+
- delta coefficients
-D str
- filename of double-type delta coefficients
-r int+
- width of 1st (and 2nd) regression coefficients
-magic double
- magic number
gmmfile str
- double-type GMM parameters
infile str
- double-type source static+dynamic vector sequence
stdout
- double-type target static vector sequence

In the following example, the converted 4-th order vectors corresponding data.source are obtained using the trained 2-mixture GMM data.gmm.

delta -l 5 -d -0.5 0.0 0.5 data.source | \
  vc -k 2 -l 5 data.gmm > data.target

Parameters:

argc – [in] Number of arguments.
argv – [in] Argument vector.

Returns:

0 on success, 1 on failure.

See also

gmm gmmp

class GaussianMixtureModelBasedConversion

Perform GMM-based voice conversion.

The input is the \((D+1)(M_1+1)\)-length source vectors:

\[ \begin{array}{cccc} \boldsymbol{X}_0, & \boldsymbol{X}_1, & \ldots, & \boldsymbol{X}_{T-1}, \end{array} \]

where

\[ \boldsymbol{X}_t = \left[ \begin{array}{cccc} \boldsymbol{x}_t^{\mathsf{T}} & \Delta^{(1)} \boldsymbol{x}_t^{\mathsf{T}} & \cdots & \Delta^{(D)} \boldsymbol{x}_t^{\mathsf{T}} \end{array} \right]^{\mathsf{T}}. \]

The output is the \((M_2+1)\)-length target vectors:

\[ \begin{array}{cccc} \boldsymbol{y}_0, & \boldsymbol{y}_1, & \ldots, & \boldsymbol{y}_{T-1}. \end{array} \]

The optimal target vectors can be drived in a maximum likelihood sense:

\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \sum_{\boldsymbol{m}} p(\boldsymbol{m} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{m}, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]

where \(\boldsymbol{\lambda}\) is the GMM that models the joint vectors. The mean vector and the covariance matrix of the \(m\)-th mixture component are written as

\[\begin{split} \boldsymbol{\mu}_m = \left[ \begin{array}{c} \boldsymbol{\mu}_m^{(X)} \\ \boldsymbol{\mu}_m^{(Y)} \end{array} \right] \end{split}\]

and

\[\begin{split} \boldsymbol{\varSigma}_m = \left[ \begin{array}{cc} \boldsymbol{\varSigma}_m^{(XX)} & \boldsymbol{\varSigma}_m^{(XY)} \\ \boldsymbol{\varSigma}_m^{(YX)} & \boldsymbol{\varSigma}_m^{(YY)} \end{array} \right]. \end{split}\]

To easily compute the ML estimate, the maximum a posteriori approximation is applied:

\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\hat{\boldsymbol{m}} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \hat{\boldsymbol{m}}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \prod_{t=0}^{T-1} p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) \, p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, \hat{m}_t, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]

where

\[ p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) = \max_m \, \frac{w_m \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_m^{(X)}, \boldsymbol{\varSigma}_m^{(XX)} \right)} {\sum_{n=1}^M w_n \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_n^{(X)}, \boldsymbol{\varSigma}_n^{(XX)} \right)}, \]

and

\[\begin{split}\begin{eqnarray} p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, m, \boldsymbol{\lambda}) &=& \mathcal{N} \left( \boldsymbol{Y}_t \,\big|\, \boldsymbol{E}_{m,t}^{(Y)}, \boldsymbol{D}_{m,t}^{(Y)} \right), \\ \boldsymbol{E}_{m,t}^{(Y)} &=& \boldsymbol{\mu}_m^{(Y)} + \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \left( \boldsymbol{X}_t - \boldsymbol{\mu}_m^{(X)} \right), \\ \boldsymbol{D}_{m,t}^{(Y)} &=& \boldsymbol{\varSigma}_m^{(YY)} - \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \boldsymbol{\varSigma}_m^{(XY)}. \end{eqnarray}\end{split}\]

The converted static vector sequence \(\hat{\boldsymbol{y}}\) under the constraint between static and dynamic components is obtained by the maximum likelihood parameter generation algorithm.

[1] T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.

Public Functions

GaussianMixtureModelBasedConversion(int num_source_order, int num_target_order, const std::vector<std::vector<double>> &window_coefficients, const std::vector<double> &weights, const std::vector<std::vector<double>> &mean_vectors, const std::vector<SymmetricMatrix> &covariance_matrices, bool use_magic_number, double magic_number = 0.0)

Parameters:

num_source_order – [in] Order of source vector, \(M_1\).
num_target_order – [in] Order of target vector, \(M_2\).
window_coefficients – [in] Window coefficients. e.g.) { {-0.5, 0.0, 0.5}, {1.0, -2.0, 1.0} }
weights – [in] \(K\) mixture weights.
mean_vectors – [in] \(K\) mean vectors. The shape is \([K, (D+1)(M_1+M_2+2)]\).
covariance_matrices – [in] \(K\) covariance matrices. The shape is \([K, (D+1)(M_1+M_2+2), (D+1)(M_1+M_2+2)]\).
use_magic_number – [in] Whether to use magic number.
magic_number – [in] A magic number represents a discrete symbol.

inline int GetNumSourceOrder() const

Returns:: Order of source vector.

inline int GetNumTargetOrder() const

Returns:: Order of target vector.

inline bool IsValid() const

Returns:: True if this object is valid.

bool Run(const std::vector<std::vector<double>> &source_vectors, std::vector<std::vector<double>> *target_vectors) const

Parameters:

source_vectors – [in] \(M_1\)-th order source vectors containing dynamic components. The shape is \([T, (D+1)(M_1+1)]\).
target_vectors – [out] \(M_2\)-th order target vectors. The shape is \([T, (M_2+1)]\).

Returns:

True on success, false on failure.