vc

Functions

int main(int argc, char *argv[])

vc [ option ] gmmfile [ infile ]

  • -l int

    • length of source vector (1M1+1)

  • -m int

    • order of source vector (0M1)

  • -L int

    • length of target vector (1M2+1)

  • -M int

    • order of target vector (0M2)

  • -k int

    • number of mixtures (1K)

  • -f

    • use full or block covariance instead of diagonal one

  • -d double+

    • delta coefficients

  • -D str

    • filename of double-type delta coefficients

  • -r int+

    • width of 1st (and 2nd) regression coefficients

  • -magic double

    • magic number

  • gmmfile str

    • double-type GMM parameters

  • infile str

    • double-type source static+dynamic vector sequence

  • stdout

    • double-type target static vector sequence

In the following example, the converted 4-th order vectors corresponding data.source are obtained using the trained 2-mixture GMM data.gmm.

delta -l 5 -d -0.5 0.0 0.5 data.source | \
  vc -k 2 -l 5 data.gmm > data.target
Parameters:
  • argc[in] Number of arguments.

  • argv[in] Argument vector.

Returns:

0 on success, 1 on failure.

See also

gmm gmmp

class GaussianMixtureModelBasedConversion

Perform GMM-based voice conversion.

The input is the (D+1)(M1+1)-length source vectors:

X0,X1,,XT1,
where
Xt=[xtTΔ(1)xtTΔ(D)xtT]T.
The output is the (M2+1)-length target vectors:
y0,y1,,yT1.

The optimal target vectors can be drived in a maximum likelihood sense:

y^=argmaxyp(Y|X,λ)=argmaxymp(m|X,λ)p(Y|X,m,λ),
where λ is the GMM that models the joint vectors. The mean vector and the covariance matrix of the m-th mixture component are written as
μm=[μm(X)μm(Y)]
and
Σm=[Σm(XX)Σm(XY)Σm(YX)Σm(YY)].
To easily compute the ML estimate, the maximum a posteriori approximation is applied:
y^=argmaxyp(m^|X,λ)p(Y|X,m^,λ)=argmaxyt=0T1p(m^t|Xt,λ)p(Yt|Xt,m^t,λ),
where
p(m^t|Xt,λ)=maxmwmN(Xt|μm(X),Σm(XX))n=1MwnN(Xt|μn(X),Σn(XX)),
and
p(Yt|Xt,m,λ)=N(Yt|Em,t(Y),Dm,t(Y)),Em,t(Y)=μm(Y)+Σm(YX)Σm(XX)1(Xtμm(X)),Dm,t(Y)=Σm(YY)Σm(YX)Σm(XX)1Σm(XY).
The converted static vector sequence y^ under the constraint between static and dynamic components is obtained by the maximum likelihood parameter generation algorithm.

[1] T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.

Public Functions

GaussianMixtureModelBasedConversion(int num_source_order, int num_target_order, const std::vector<std::vector<double>> &window_coefficients, const std::vector<double> &weights, const std::vector<std::vector<double>> &mean_vectors, const std::vector<SymmetricMatrix> &covariance_matrices, bool use_magic_number, double magic_number = 0.0)
Parameters:
  • num_source_order[in] Order of source vector, M1.

  • num_target_order[in] Order of target vector, M2.

  • window_coefficients[in] Window coefficients. e.g.) { {-0.5, 0.0, 0.5}, {1.0, -2.0, 1.0} }

  • weights[in] K mixture weights.

  • mean_vectors[in] K mean vectors. The shape is [K,(D+1)(M1+M2+2)].

  • covariance_matrices[in] K covariance matrices. The shape is [K,(D+1)(M1+M2+2),(D+1)(M1+M2+2)].

  • use_magic_number[in] Whether to use magic number.

  • magic_number[in] A magic number represents a discrete symbol.

inline int GetNumSourceOrder() const
Returns:

Order of source vector.

inline int GetNumTargetOrder() const
Returns:

Order of target vector.

inline bool IsValid() const
Returns:

True if this object is valid.

bool Run(const std::vector<std::vector<double>> &source_vectors, std::vector<std::vector<double>> *target_vectors) const
Parameters:
  • source_vectors[in] M1-th order source vectors containing dynamic components. The shape is [T,(D+1)(M1+1)].

  • target_vectors[out] M2-th order target vectors. The shape is [T,(M2+1)].

Returns:

True on success, false on failure.