yingram#

class diffsptk.Yingram(frame_length: int, sample_rate: int = 22050, lag_min: int = 22, lag_max: int | None = None, n_bin: int = 20, device: device | None = None, dtype: dtype | None = None)[source]#

Pitch-related feature extraction module based on YIN.

Parameters:

frame_lengthint >= 1: The frame length in samples, \(L\).
sample_rateint >= 8000: The sample rate in Hz.
lag_minint >= 1: The minimum lag in points.
lag_maxint < L: The maximum lag in points.
n_binint >= 1: The number of bins to represent a semitone range.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

References

[1]

A. Cheveigne and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, 2002.

[2]

H. Choi et al., “Neural analysis and synthesis: Reconstructing speech from self-supervised representations,” arXiv:2110.14513, 2021.

forward(x: Tensor) → Tensor[source]#

Compute the YIN derivatives from the waveform.

Parameters:

xTensor [shape=(…, L)]: The framed waveform.

Returns:

outTensor [shape=(…, M)]: The Yingram.

Examples

>>> x = diffsptk.nrand(22050)
>>> frame = diffsptk.Frame(2048, 441)
>>> yingram = diffsptk.Yingram(2048)
>>> y = yingram(frame(x))
>>> y.shape
torch.Size([51, 1580])