pitch#
- class diffsptk.Pitch(frame_period, sample_rate, algorithm='fcnf0', out_format='pitch', **kwargs)[source]#
Pitch extraction module using external neural models.
- Parameters:
- frame_periodint >= 1
Frame period, \(P\).
- sample_rateint >= 1
Sample rate in Hz.
- algorithm[‘crepe’, ‘fcnf0’]
Algorithm.
- out_format[‘pitch’, ‘f0’, ‘log-f0’, ‘prob’, ‘embed’]
Output format.
- f_minfloat >= 0
Minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2
Maximum frequency in Hz.
- voicing_thresholdfloat
Voiced/unvoiced threshold.
References
[1]J. W. Kim et al., “CREPE: A convolutional representation for pitch estimation,” Proceedings of ICASSP, pp. 161-165, 2018.
[2]M. Morisson et al., “Cross-domain neural pitch and periodicity estimation,” arXiv prepreint, arXiv:2301.12258, 2023.
- forward(x)[source]#
Compute pitch representation.
- Parameters:
- xTensor [shape=(B, T) or (T,)]
Waveform.
- Returns:
- outTensor [shape=(B, N, C) or (N, C) or (B, N) or (N,)]
Pitch probability, embedding, or pitch, where N is the number of frames and C is the number of pitch classes or the dimension of embedding.
Examples
>>> x = diffsptk.sin(1000, 80) >>> pitch = diffsptk.Pitch(160, 8000, out_format="f0") >>> y = pitch(x) >>> y tensor([ 0.0000, 99.7280, 99.7676, 99.8334, 99.8162, 100.1602, 0.0000])
See also