pitch#

class diffsptk.Pitch(frame_period, sample_rate, algorithm='crepe', out_format='pitch', **kwargs)[source]#

Pitch extraction module using external neural models.

Parameters:
frame_periodint >= 1 [scalar]

Frame period, \(P\).

sample_rateint >= 1 [scalar]

Sample rate in Hz.

algorithm[‘crepe’]

Algorithm.

out_format[‘pitch’, ‘f0’, ‘log-f0’, ‘prob’, ‘embed’]

Output format.

f_minfloat >= 0 [scalar]

Minimum frequency in Hz.

f_maxfloat <= sample_rate // 2 [scalar]

Maximum frequency in Hz.

voicing_thresholdfloat [scalar]

Voiced/unvoiced threshold.

silence_thresholdfloat [scalar]

Silence threshold in dB.

filter_lengthint >= 1 [scalar]

Window length of median and moving average filters.

model[‘tiny’, ‘full’]

Model size.

forward(x)[source]#

Compute pitch representation.

Parameters:
xTensor [shape=(B, T) or (T,)]

Waveform.

Returns:
yTensor [shape=(B, N, C) or (N, C) or (B, N) or (N,)]

Pitch probability, embedding, or pitch, where N is the number of frames and C is the number of pitch classes or the dimension of embedding.

Examples

>>> x = diffsptk.sin(100, 10)
>>> pitch = diffsptk.Pitch(80, 16000)
>>> y = pitch(x)
>>> y
tensor([10.0860, 10.0860])

See also

excite