yingram#
- class diffsptk.Yingram(frame_length, sample_rate=22050, lag_min=22, lag_max=None, n_bin=20)[source]#
Pitch-related feature extraction module based on YIN.
- Parameters:
- frame_lengthint >= 1
Frame length,
.- sample_rateint >= 1
Sample rate in Hz.
- lag_minint >= 1
Minimum lag in points.
- lag_maxint < L
Maximum lag in points.
- n_binint >= 1
Number of bins of Yingram to represent a semitone range.
References
[1]A. Cheveigne and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, 2002.
[2]H. Choi et al., “Neural analysis and synthesis: Reconstructing speech from self-supervised representations,” arXiv:2110.14513, 2021.
- forward(x)[source]#
Compute YIN derivatives.
- Parameters:
- xTensor [shape=(…, L)]
Framed waveform.
- Returns:
- outTensor [shape=(…, M)]
Yingram.
Examples
>>> x = diffsptk.nrand(22050) >>> frame = diffsptk.Frame(2048, 441) >>> yingram = diffsptk.Yingram(2048) >>> y = yingram(frame(x)) >>> y.shape torch.Size([51, 1580])