ap#

class diffsptk.Aperiodicity(frame_period: int, sample_rate: int, fft_length: int | None = None, algorithm: str = 'tandem', out_format: str | int = 'a', lower_bound: float = 0.001, upper_bound: float = 0.999, **kwargs)[source]#

See this page for details. Note that the gradients do not propagated through F0.

Parameters:

frame_periodint >= 1: The frame period in samples, \(P\).
sample_rateint >= 8000: The sample rate in Hz.
fft_lengthint >= 16 or None: The size of double-sided aperiodicity, \(L\). If None, the band aperiodicity (uninterpolated aperiodicity) is returned as the output.
algorithm[‘tandem’, ‘d4c’]: The algorithm to estimate aperiodicity.
out_format[‘a’, ‘p’, ‘a/p’, ‘p/a’]: The output format.
lower_boundfloat >= 0: The lower bound of aperiodicity.
upper_boundfloat <= 1: The upper bound of aperiodicity.
devicetorch.device or None: The device of this module.
dtypetorch.dtype or None: The data type of this module.

References

[1]

H. Kawahara et al., “Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems,” Proceedings of Interspeech, pp. 38-41, 2010.

[2]

M. Morise, “D4C, a band-aperiodicity estimator for high-quality speech synthesis,” Speech Communication, vol. 84, pp. 57-65, 2016.

forward(x: Tensor, f0: Tensor) → Tensor[source]#

Compute aperiodicity measure.

Parameters:

xTensor [shape=(B, T) or (T,)]: The input waveform.
f0Tensor [shape=(B, T/P) or (T/P,)]: The F0 in Hz.

Returns:

outTensor [shape=(B, T/P, L/2+1) or (T/P, L/2+1)]: The aperiodicity.

Examples

>>> x = diffsptk.sin(1000, 80)
>>> pitch = diffsptk.Pitch(160, 8000, out_format="f0")
>>> f0 = pitch(x)
>>> f0.shape
torch.Size([7])
>>> aperiodicity = diffsptk.Aperiodicity(160, 16000, 1024)
>>> ap = aperiodicity(x, f0)
>>> ap.shape
torch.Size([7, 513])