信息处理 - MFCC 帮助：相同的声音，不同的系数？ - 吾爱随笔录

问题：我得到了完全不同的 MFCC 系数，用于相同和几乎相同的声音，例如几次弹指或几次桌面敲击。我错过了什么？我认为由于音色并没有真正改变，因此系数不会有太大不同。

我正在使用 aubio 库从麦克风输入中提取 MFCC，采样率为 44.1KHz，缓冲区长度为 1028，跳数为 512。例如，当在名为的 512 样本缓冲区中检测到起始时in，我将发送in到 MFCC 提取函数。

这是当有一个充满音频的缓冲区准备好处理时调用的回调：

int record( void *outputBuffer, void *inputBuffer, unsigned int nBufferFrames,
         double streamTime, RtAudioStreamStatus status, void *userData )
{
   if (status)
      std::cout << "Stream overflow detected!" << std::endl;

   // Do something with the data in the "inputBuffer" buffer.
   smpl_t * input = (smpl_t *) inputBuffer;

   InputData * data = (InputData *) userData;

   //only hop_size length samples allowed in in->data; must loop to fill bit by bit.
   while (data->offset < 1024) 
   {
      std::copy(input + data->offset, input + data->offset + 511, in->data);

      aubio_onset_do(o,in,out);       

      //do something with the onsets  

      if (out->data[0] != 0) 
      { 
         fprintf(stderr, "ONSET DETECTED! \n");

         //compute mag spectrum (pv- phase vocoder obj; in- takes hop_size input; fftgrain- spectrum output.
         aubio_pvoc_do (pv, in, fftgrain); 

         //compute mfccs (mfcc-mfcc object, mfcc_out- 13 MFCC coefficients)
         aubio_mfcc_do(mfcc, fftgrain, mfcc_out);

         fvec_print(mfcc_out);


    } 
    data->offset += 512;            
    }

    data->offset = 0;
    if (streamTime > 4) 
       return 1; //abort the stream at 4 seconds

    return 0; //continue normal stream operation
}