vc

Functions

int main(int argc, char *argv[])

vc [ option ] gmmfile [ infile ]

  • -l int

    • length of source vector \((1 \le M_1 + 1)\)

  • -m int

    • order of source vector \((0 \le M_1)\)

  • -L int

    • length of target vector \((1 \le M_2 + 1)\)

  • -M int

    • order of target vector \((0 \le M_2)\)

  • -k int

    • number of mixtures \((1 \le K)\)

  • -f bool

    • use full or block covariance instead of diagonal one

  • -d double+

    • delta coefficients

  • -D string

    • filename of double-type delta coefficients

  • -r int+

    • width of 1st (and 2nd) regression coefficients

  • -magic double

    • magic number

  • gmmfile str

    • double-type GMM parameters

  • infile str

    • double-type source static+dynamic vector sequence

  • stdout

    • double-type target static vector sequence

In the following example, the converted 4-th order vectors corresponding data.source are obtained using the trained 2-mixture GMM data.gmm.

delta -l 5 -d -0.5 0.0 0.5 data.source | \
  vc -k 2 -l 5 data.gmm > data.target

Parameters
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns

0 on success, 1 on failure.

See also

gmm gmmp

class sptk::GaussianMixtureModelBasedConversion

Perform GMM-based voice conversion.

The input is the \((D+1)(M_1+1)\)-length source vectors:

\[ \begin{array}{cccc} \boldsymbol{X}_0, & \boldsymbol{X}_1, & \ldots, & \boldsymbol{X}_{T-1}, \end{array} \]
where
\[ \boldsymbol{X}_t = \left[ \begin{array}{cccc} \boldsymbol{x}_t^{\mathsf{T}} & \Delta^{(1)} \boldsymbol{x}_t^{\mathsf{T}} & \cdots & \Delta^{(D)} \boldsymbol{x}_t^{\mathsf{T}} \end{array} \right]^{\mathsf{T}}. \]
The output is the \((M_2+1)\)-length target vectors:
\[ \begin{array}{cccc} \boldsymbol{y}_0, & \boldsymbol{y}_1, & \ldots, & \boldsymbol{y}_{T-1}. \end{array} \]

The optimal target vectors can be drived in a maximum likelihood sense:

\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \sum_{\boldsymbol{m}} p(\boldsymbol{m} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{m}, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]
where \(\boldsymbol{\lambda}\) is the GMM that models the joint vectors. The mean vector and the covariance matrix of the \(m\)-th mixture component are written as
\[\begin{split} \boldsymbol{\mu}_m = \left[ \begin{array}{c} \boldsymbol{\mu}_m^{(X)} \\ \boldsymbol{\mu}_m^{(Y)} \end{array} \right] \end{split}\]
and
\[\begin{split} \boldsymbol{\varSigma}_m = \left[ \begin{array}{cc} \boldsymbol{\varSigma}_m^{(XX)} & \boldsymbol{\varSigma}_m^{(XY)} \\ \boldsymbol{\varSigma}_m^{(YX)} & \boldsymbol{\varSigma}_m^{(YY)} \end{array} \right]. \end{split}\]
To easily compute the ML estimate, the maximum a posteriori approximation is applied:
\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\hat{\boldsymbol{m}} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \hat{\boldsymbol{m}}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \prod_{t=0}^{T-1} p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) \, p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, \hat{m}_t, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]
where
\[ p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) = \max_m \, \frac{w_m \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_m^{(X)}, \boldsymbol{\varSigma}_m^{(XX)} \right)} {\sum_{n=1}^M w_n \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_n^{(X)}, \boldsymbol{\varSigma}_n^{(XX)} \right)}, \]
and
\[\begin{split}\begin{eqnarray} p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, m, \boldsymbol{\lambda}) &=& \mathcal{N} \left( \boldsymbol{Y}_t \,\big|\, \boldsymbol{E}_{m,t}^{(Y)}, \boldsymbol{D}_{m,t}^{(Y)} \right), \\ \boldsymbol{E}_{m,t}^{(Y)} &=& \boldsymbol{\mu}_m^{(Y)} + \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \left( \boldsymbol{X}_t - \boldsymbol{\mu}_m^{(X)} \right), \\ \boldsymbol{D}_{m,t}^{(Y)} &=& \boldsymbol{\varSigma}_m^{(YY)} - \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \boldsymbol{\varSigma}_m^{(XY)}. \end{eqnarray}\end{split}\]
The converted static vector sequence \(\hat{\boldsymbol{y}}\) under the constraint between static and dynamic components is obtained by the maximum likelihood parameter generation algorithm.

[1] T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.

Public Functions

GaussianMixtureModelBasedConversion(int num_source_order, int num_target_order, const std::vector<std::vector<double>> &window_coefficients, const std::vector<double> &weights, const std::vector<std::vector<double>> &mean_vectors, const std::vector<SymmetricMatrix> &covariance_matrices, bool use_magic_number, double magic_number = 0.0)
Parameters
  • num_source_order[in] Order of source vector, \(M_1\).

  • num_target_order[in] Order of target vector, \(M_2\).

  • window_coefficients[in] Window coefficients. e.g.) { {-0.5, 0.0, 0.5}, {1.0, -2.0, 1.0} }

  • weights[in] \(K\) mixture weights.

  • mean_vectors[in] \(K\) mean vectors. The shape is \([K, (D+1)(M_1+M_2+2)]\).

  • covariance_matrices[in] \(K\) covariance matrices. The shape is \([K, (D+1)(M_1+M_2+2), (D+1)(M_1+M_2+2)]\).

  • use_magic_number[in] Whether to use magic number.

  • magic_number[in] A magic number represents a discrete symbol.

inline int GetNumSourceOrder() const
Returns

Order of source vector.

inline int GetNumTargetOrder() const
Returns

Order of target vector.

inline bool IsValid() const
Returns

True if this object is valid.

bool Run(const std::vector<std::vector<double>> &source_vectors, std::vector<std::vector<double>> *target_vectors) const
Parameters
  • source_vectors[in] \(M_1\)-th order source vectors containing dynamic components. The shape is \([T, (D+1)(M_1+1)]\).

  • target_vectors[out] \(M_2\)-th order target vectors. The shape is \([T, (M_2+1)]\).

Returns

True on success, false on failure.