yingram#
- class diffsptk.Yingram(frame_length: int, sample_rate: int = 22050, lag_min: int = 22, lag_max: int | None = None, n_bin: int = 20)[source]#
Pitch-related feature extraction module based on YIN.
- Parameters:
- frame_lengthint >= 1
The frame length in samples, \(L\).
- sample_rateint >= 8000
The sample rate in Hz.
- lag_minint >= 1
The minimum lag in points.
- lag_maxint < L
The maximum lag in points.
- n_binint >= 1
The number of bins to represent a semitone range.
References
[1]A. Cheveigne and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, 2002.
[2]H. Choi et al., “Neural analysis and synthesis: Reconstructing speech from self-supervised representations,” arXiv:2110.14513, 2021.
- forward(x: Tensor) Tensor [source]#
Compute the YIN derivatives from the waveform.
- Parameters:
- xTensor [shape=(…, L)]
The framed waveform.
- Returns:
- outTensor [shape=(…, M)]
The Yingram.
Examples
>>> x = diffsptk.nrand(22050) >>> frame = diffsptk.Frame(2048, 441) >>> yingram = diffsptk.Yingram(2048) >>> y = yingram(frame(x)) >>> y.shape torch.Size([51, 1580])