plp#

diffsptk.PLP#

alias of PerceptualLinearPredictiveCoefficientsAnalysis

class diffsptk.PerceptualLinearPredictiveCoefficientsAnalysis(*, fft_length, plp_order, n_channel, sample_rate, compression_factor=0.33, lifter=1, f_min=0, f_max=None, floor=1e-05, n_fft=512, out_format='y')[source]#

See this page for details.

Parameters:
fft_lengthint >= 2

The number of FFT bins, \(L\).

plp_orderint >= 1

The order of the PLP, \(M\).

n_channelint >= 1

Number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

compression_factorfloat > 0

The amplitude compression factor.

lifterint >= 1

The liftering coefficient.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

n_fftint >> M

The number of FFT bins for the conversion from LPC to cepstrum. The accurate conversion requires the large value.

out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]

y is PLP, c is C0, and E is energy.

References

[1]
  1. Young et al., “The HTK Book,” Cambridge University Press, 2006.

forward(x)[source]#

Compute the PLP from the power spectrum.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

Returns:
yTensor [shape=(…, M)]

The PLP without C0.

ETensor [shape=(…, 1)] (optional)

The energy.

cTensor [shape=(…, 1)] (optional)

The C0.

Examples

>>> x = diffsptk.ramp(19)
>>> stft = diffsptk.STFT(frame_length=10, frame_period=10, fft_length=32)
>>> plp = diffsptk.PLP(4, 8, 32, 8000)
>>> y = plp(stft(x))
>>> y
tensor([[-0.2896, -0.2356, -0.0586, -0.0387],
        [ 0.4468, -0.5820,  0.0104, -0.0505]])
diffsptk.functional.plp(x, plp_order, n_channel, sample_rate, compression_factor=0.33, lifter=1, f_min=0, f_max=None, floor=1e-05, n_fft=512, out_format='y')[source]#

Compute the MFCC from the power spectrum.

Parameters:
xTensor [shape=(…, L/2+1)]

The power spectrum.

plp_orderint >= 1

The order of the PLP, \(M\).

n_channelint >= 1

The number of mel filter banks, \(C\).

sample_rateint >= 1

The sample rate in Hz.

compression_factorfloat > 0

The amplitude compression factor.

lifterint >= 1

The liftering coefficient.

f_minfloat >= 0

The minimum frequency in Hz.

f_maxfloat <= sample_rate // 2

The maximum frequency in Hz.

floorfloat > 0

The minimum mel filter bank output in linear scale.

n_fftint >> M

The number of FFT bins for the conversion from LPC to cepstrum.

out_format[‘y’, ‘yE’, ‘yc’, ‘ycE’]

y is MFCC, c is C0, and E is energy.

Returns:
yTensor [shape=(…, M)]

The MFCC without C0.

ETensor [shape=(…, 1)] (optional)

The energy.

cTensor [shape=(…, 1)] (optional)

The C0.

See also

stft fbank mfcc