vc
Functions
-
int main(int argc, char *argv[])
vc [ option ] gmmfile [ infile ]
-l int
length of source vector \((1 \le M_1 + 1)\)
-m int
order of source vector \((0 \le M_1)\)
-L int
length of target vector \((1 \le M_2 + 1)\)
-M int
order of target vector \((0 \le M_2)\)
-k int
number of mixtures \((1 \le K)\)
-f
use full or block covariance instead of diagonal one
-d double+
delta coefficients
-D str
filename of double-type delta coefficients
-r int+
width of 1st (and 2nd) regression coefficients
-magic double
magic number
gmmfile str
double-type GMM parameters
infile str
double-type source static+dynamic vector sequence
stdout
double-type target static vector sequence
In the following example, the converted 4-th order vectors corresponding
data.source
are obtained using the trained 2-mixture GMMdata.gmm
.delta -l 5 -d -0.5 0.0 0.5 data.source | \ vc -k 2 -l 5 data.gmm > data.target
- Parameters:
argc – [in] Number of arguments.
argv – [in] Argument vector.
- Returns:
0 on success, 1 on failure.
-
class GaussianMixtureModelBasedConversion
Perform GMM-based voice conversion.
The input is the \((D+1)(M_1+1)\)-length source vectors:
\[ \begin{array}{cccc} \boldsymbol{X}_0, & \boldsymbol{X}_1, & \ldots, & \boldsymbol{X}_{T-1}, \end{array} \]where\[ \boldsymbol{X}_t = \left[ \begin{array}{cccc} \boldsymbol{x}_t^{\mathsf{T}} & \Delta^{(1)} \boldsymbol{x}_t^{\mathsf{T}} & \cdots & \Delta^{(D)} \boldsymbol{x}_t^{\mathsf{T}} \end{array} \right]^{\mathsf{T}}. \]The output is the \((M_2+1)\)-length target vectors:\[ \begin{array}{cccc} \boldsymbol{y}_0, & \boldsymbol{y}_1, & \ldots, & \boldsymbol{y}_{T-1}. \end{array} \]The optimal target vectors can be drived in a maximum likelihood sense:
\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \sum_{\boldsymbol{m}} p(\boldsymbol{m} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \boldsymbol{m}, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]where \(\boldsymbol{\lambda}\) is the GMM that models the joint vectors. The mean vector and the covariance matrix of the \(m\)-th mixture component are written as\[\begin{split} \boldsymbol{\mu}_m = \left[ \begin{array}{c} \boldsymbol{\mu}_m^{(X)} \\ \boldsymbol{\mu}_m^{(Y)} \end{array} \right] \end{split}\]and\[\begin{split} \boldsymbol{\varSigma}_m = \left[ \begin{array}{cc} \boldsymbol{\varSigma}_m^{(XX)} & \boldsymbol{\varSigma}_m^{(XY)} \\ \boldsymbol{\varSigma}_m^{(YX)} & \boldsymbol{\varSigma}_m^{(YY)} \end{array} \right]. \end{split}\]To easily compute the ML estimate, the maximum a posteriori approximation is applied:\[\begin{split}\begin{eqnarray} \hat{\boldsymbol{y}} &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \, p(\hat{\boldsymbol{m}} \,|\, \boldsymbol{X}, \boldsymbol{\lambda}) \, p(\boldsymbol{Y} \,|\, \boldsymbol{X}, \hat{\boldsymbol{m}}, \boldsymbol{\lambda}) \\ &=& \mathop{\mathrm{argmax}}_{\boldsymbol{y}} \prod_{t=0}^{T-1} p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) \, p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, \hat{m}_t, \boldsymbol{\lambda}), \end{eqnarray}\end{split}\]where\[ p(\hat{m}_t \,|\, \boldsymbol{X}_t, \boldsymbol{\lambda}) = \max_m \, \frac{w_m \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_m^{(X)}, \boldsymbol{\varSigma}_m^{(XX)} \right)} {\sum_{n=1}^M w_n \mathcal{N} \left( \boldsymbol{X}_t \,\big|\, \boldsymbol{\mu}_n^{(X)}, \boldsymbol{\varSigma}_n^{(XX)} \right)}, \]and\[\begin{split}\begin{eqnarray} p(\boldsymbol{Y}_t \,|\, \boldsymbol{X}_t, m, \boldsymbol{\lambda}) &=& \mathcal{N} \left( \boldsymbol{Y}_t \,\big|\, \boldsymbol{E}_{m,t}^{(Y)}, \boldsymbol{D}_{m,t}^{(Y)} \right), \\ \boldsymbol{E}_{m,t}^{(Y)} &=& \boldsymbol{\mu}_m^{(Y)} + \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \left( \boldsymbol{X}_t - \boldsymbol{\mu}_m^{(X)} \right), \\ \boldsymbol{D}_{m,t}^{(Y)} &=& \boldsymbol{\varSigma}_m^{(YY)} - \boldsymbol{\varSigma}_m^{(YX)} \boldsymbol{\varSigma}_m^{(XX)^{-1}} \boldsymbol{\varSigma}_m^{(XY)}. \end{eqnarray}\end{split}\]The converted static vector sequence \(\hat{\boldsymbol{y}}\) under the constraint between static and dynamic components is obtained by the maximum likelihood parameter generation algorithm.[1] T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
Public Functions
-
GaussianMixtureModelBasedConversion(int num_source_order, int num_target_order, const std::vector<std::vector<double>> &window_coefficients, const std::vector<double> &weights, const std::vector<std::vector<double>> &mean_vectors, const std::vector<SymmetricMatrix> &covariance_matrices, bool use_magic_number, double magic_number = 0.0)
- Parameters:
num_source_order – [in] Order of source vector, \(M_1\).
num_target_order – [in] Order of target vector, \(M_2\).
window_coefficients – [in] Window coefficients. e.g.) { {-0.5, 0.0, 0.5}, {1.0, -2.0, 1.0} }
weights – [in] \(K\) mixture weights.
mean_vectors – [in] \(K\) mean vectors. The shape is \([K, (D+1)(M_1+M_2+2)]\).
covariance_matrices – [in] \(K\) covariance matrices. The shape is \([K, (D+1)(M_1+M_2+2), (D+1)(M_1+M_2+2)]\).
use_magic_number – [in] Whether to use magic number.
magic_number – [in] A magic number represents a discrete symbol.
-
inline int GetNumSourceOrder() const
- Returns:
Order of source vector.
-
inline int GetNumTargetOrder() const
- Returns:
Order of target vector.
-
inline bool IsValid() const
- Returns:
True if this object is valid.
-
bool Run(const std::vector<std::vector<double>> &source_vectors, std::vector<std::vector<double>> *target_vectors) const
- Parameters:
source_vectors – [in] \(M_1\)-th order source vectors containing dynamic components. The shape is \([T, (D+1)(M_1+1)]\).
target_vectors – [out] \(M_2\)-th order target vectors. The shape is \([T, (M_2+1)]\).
- Returns:
True on success, false on failure.
-
GaussianMixtureModelBasedConversion(int num_source_order, int num_target_order, const std::vector<std::vector<double>> &window_coefficients, const std::vector<double> &weights, const std::vector<std::vector<double>> &mean_vectors, const std::vector<SymmetricMatrix> &covariance_matrices, bool use_magic_number, double magic_number = 0.0)