plp#
- diffsptk.PLP#
- class diffsptk.PerceptualLinearPredictiveCoefficientsAnalysis(*, fft_length: int, plp_order: int, n_channel: int, sample_rate: int, compression_factor: float = 0.33, lifter: int = 1, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, n_fft: int = 512, out_format: str | int = 'y')[source]#
See this page for details.
- Parameters:
- fft_lengthint >= 2
The number of FFT bins, \(L\).
- plp_orderint >= 1
The order of the PLP, \(M\).
- n_channelint >= 1
Number of mel filter banks, \(C\).
- sample_rateint >= 1
The sample rate in Hz.
- compression_factorfloat > 0
The amplitude compression factor.
- lifterint >= 1
The liftering coefficient.
- f_minfloat >= 0
The minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2
The maximum frequency in Hz.
- floorfloat > 0
The minimum mel filter bank output in linear scale.
- n_fftint >> M
The number of FFT bins for the conversion from LPC to cepstrum. The accurate conversion requires the large value.
- out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]
y is PLP, c is C0, and E is energy.
References
[1]Young et al., “The HTK Book,” Cambridge University Press, 2006.
- forward(x: Tensor) Tensor [source]#
Compute the PLP from the power spectrum.
- Parameters:
- xTensor [shape=(…, L/2+1)]
The power spectrum.
- Returns:
- yTensor [shape=(…, M)]
The PLP without C0.
- ETensor [shape=(…, 1)] (optional)
The energy.
- cTensor [shape=(…, 1)] (optional)
The C0.
Examples
>>> x = diffsptk.ramp(19) >>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32) >>> plp = diffsptk.PLP( ... fft_length=32, mfcc_order=4, n_channel=8, sample_rate=8000 ... ) >>> y = plp(stft(x)) >>> y tensor([[-0.2896, -0.2356, -0.0586, -0.0387], [ 0.4468, -0.5820, 0.0104, -0.0505]])
- diffsptk.functional.plp(x: Tensor, plp_order: int, n_channel: int, sample_rate: int, compression_factor: float = 0.33, lifter: int = 1, f_min: float = 0, f_max: float | None = None, floor: float = 1e-05, n_fft: int = 512, out_format: str = 'y') Tensor [source]#
Compute the MFCC from the power spectrum.
- Parameters:
- xTensor [shape=(…, L/2+1)]
The power spectrum.
- plp_orderint >= 1
The order of the PLP, \(M\).
- n_channelint >= 1
The number of mel filter banks, \(C\).
- sample_rateint >= 1
The sample rate in Hz.
- compression_factorfloat > 0
The amplitude compression factor.
- lifterint >= 1
The liftering coefficient.
- f_minfloat >= 0
The minimum frequency in Hz.
- f_maxfloat <= sample_rate // 2
The maximum frequency in Hz.
- floorfloat > 0
The minimum mel filter bank output in linear scale.
- n_fftint >> M
The number of FFT bins for the conversion from LPC to cepstrum.
- out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]
y is MFCC, c is C0, and E is energy.
- Returns:
- yTensor [shape=(…, M)]
The MFCC without C0.
- ETensor [shape=(…, 1)] (optional)
The energy.
- cTensor [shape=(…, 1)] (optional)
The C0.