问题:我得到了完全不同的 MFCC 系数,用于相同和几乎相同的声音,例如几次弹指或几次桌面敲击。我错过了什么?我认为由于音色并没有真正改变,因此系数不会有太大不同。
我正在使用 aubio 库从麦克风输入中提取 MFCC,采样率为 44.1KHz,缓冲区长度为 1028,跳数为 512。例如,当在名为 的 512 样本缓冲区中检测到起始时in
,我将发送in
到 MFCC 提取函数。
这是当有一个充满音频的缓冲区准备好处理时调用的回调:
int record( void *outputBuffer, void *inputBuffer, unsigned int nBufferFrames,
double streamTime, RtAudioStreamStatus status, void *userData )
{
if (status)
std::cout << "Stream overflow detected!" << std::endl;
// Do something with the data in the "inputBuffer" buffer.
smpl_t * input = (smpl_t *) inputBuffer;
InputData * data = (InputData *) userData;
//only hop_size length samples allowed in in->data; must loop to fill bit by bit.
while (data->offset < 1024)
{
std::copy(input + data->offset, input + data->offset + 511, in->data);
aubio_onset_do(o,in,out);
//do something with the onsets
if (out->data[0] != 0)
{
fprintf(stderr, "ONSET DETECTED! \n");
//compute mag spectrum (pv- phase vocoder obj; in- takes hop_size input; fftgrain- spectrum output.
aubio_pvoc_do (pv, in, fftgrain);
//compute mfccs (mfcc-mfcc object, mfcc_out- 13 MFCC coefficients)
aubio_mfcc_do(mfcc, fftgrain, mfcc_out);
fvec_print(mfcc_out);
}
data->offset += 512;
}
data->offset = 0;
if (streamTime > 4)
return 1; //abort the stream at 4 seconds
return 0; //continue normal stream operation
}