yingram#

class diffsptk.Yingram(frame_length, sample_rate=22050, lag_min=22, lag_max=None, n_bin=20)[source]#

Pitch-related feature extraction module based on YIN.

Parameters:
frame_lengthint >= 1 [scalar]

Frame length, \(L\).

sample_rateint >= 1 [scalar]

Sample rate in Hz.

lag_minint >= 1 [scalar]

Minimum lag in points.

lag_maxint <= \(L\) [scalar]

Maximum lag in points.

n_binint >= 1 [scalar]

Number of bins of Yingram to represent a semitone range.

References

[1]

A. Cheveigne and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, 2002.

[2]

H. Choi et al., “Neural analysis and synthesis: Reconstructing speech from self-supervised representations,” arXiv:2110.14513, 2021.

forward(x)[source]#

Compute YIN derivatives.

Parameters:
xTensor [shape=(…, L)]

Framed waveform.

Returns:
yTensor [shape=(…, M)]

Yingram.

Examples

>>> x = diffsptk.nrand(2047)
>>> yingram = diffsptk.Yingram(x.size(-1))
>>> y = yingram(x)
>>> y.shape
torch.Size([1580])