pitch#
- class diffsptk.Pitch(frame_period, sample_rate, algorithm='crepe', out_format='pitch', **kwargs)[source]#
Pitch extraction module using external neural models.
- Parameters:
- frame_periodint >= 1 [scalar]
Frame period, \(P\).
- sample_rateint >= 1 [scalar]
Sample rate in Hz.
- algorithm[‘crepe’]
Algorithm.
- out_format[‘pitch’, ‘f0’, ‘log-f0’, ‘prob’, ‘embed’]
Output format.
- f_minfloat >= 0 [scalar]
Minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2 [scalar]
Maximum frequency in Hz.
- voicing_thresholdfloat [scalar]
Voiced/unvoiced threshold.
- silence_thresholdfloat [scalar]
Silence threshold in dB.
- filter_lengthint >= 1 [scalar]
Window length of median and moving average filters.
- model[‘tiny’, ‘full’]
Model size.
- forward(x)[source]#
Compute pitch representation.
- Parameters:
- xTensor [shape=(B, T) or (T,)]
Waveform.
- Returns:
- yTensor [shape=(B, N, C) or (N, C) or (B, N) or (N,)]
Pitch probability, embedding, or pitch, where N is the number of frames and C is the number of pitch classes or the dimension of embedding.
Examples
>>> x = diffsptk.sin(100, 10) >>> pitch = diffsptk.Pitch(80, 16000) >>> y = pitch(x) >>> y tensor([10.0860, 10.0860])
See also