指定在 Python 中录制音频的最小触发频率

信息处理 声音的 Python 频率
2022-01-30 00:00:31

我正在使用 pyaudio 在 Python 中编写用于声音激活录制的脚本。我想在声音超过预先指定的音量和频率后触发 5 秒录音。我已经设法让音量部分正常工作,但不知道如何指定最小触发频率(例如,我希望它以高于 10kHz 的频率触发):

import pyaudio
import wave
from array import array
import time
 
FORMAT=pyaudio.paInt16
CHANNELS=1
RATE=44100
CHUNK=1024
RECORD_SECONDS=5

audio=pyaudio.PyAudio() 

stream=audio.open(format=FORMAT,channels=CHANNELS, 
                  rate=RATE,
                  input=True,
                  frames_per_buffer=CHUNK)

nighttime=True

while nighttime:
     data=stream.read(CHUNK)
     data_chunk=array('h',data)
     vol=max(data_chunk)
     if(vol>=3000):
         print("recording triggered")
         frames=[]
         for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
             data = stream.read(CHUNK)
             frames.append(data)
         print("recording saved")
         # write to file
         words = ["RECORDING-", time.strftime("%Y%m%d-%H%M%S"), ".wav"]
         FILE_NAME= "".join(words) 
         wavfile=wave.open(FILE_NAME,'wb')
         wavfile.setnchannels(CHANNELS)
         wavfile.setsampwidth(audio.get_sample_size(FORMAT))
         wavfile.setframerate(RATE)
         wavfile.writeframes(b''.join(frames))
         wavfile.close()
     # check if still nighttime
     nighttime=True 
 
 stream.stop_stream()
 stream.close()
 audio.terminate()

我想在行中添加if(vol>=3000):类似的内容if(vol>=3000 and frequency>10000):,但我不知道如何设置frequency. 这该怎么做?

2个回答

您需要对 执行 FFT data_chunk,然后取结果复数数组的大小。结果将是一个向量,其中包含 中所有频率分量的幅度data_chunk这就是你要这样做的方式:

from scipy.fft import fft
import numpy as np

Y = fft(data_chunk)
Y = Y[0:round(len(Y)/2)]
YMag = np.abs(Y)

现在 YMag 的频率幅度从 0 Hz 到 Hz 不等,其中是采样频率,即每秒接收多少个样本。现在要在 YMag 中找到特定频率,假设您将使用以下公式,其中 N 是元素的数量,n 是频率分量在 中的位置Fs/2Fsfn=2fN/FsYMagfYMag

您需要的是滤除较低的频率:您可以使用高通滤波器来做到这一点。

由于您实际上并不需要在非常窄的频带中从“非常阻塞”到“完全让所有东西通过”的过滤器,因此该过滤器将非常短(这意味着每个样本只需要很少的乘法即可计算其输出。

我已经安装pyfda了一个快速而肮脏的设计¹,使用与您相同的采样率、12 kHz 的截止频率和 4 kHz 的过渡宽度:

幅度频率响应

就“这个过滤器实际上看起来如何”而言:

b = [0.04889, -0.1332, 0.2142, -0.2142, 0.1332, -0.04889]
a = [1,        1.217,  1.996,   1.56,   0.9704,  0.3965 ]

就是这样。那就是过滤器。

你可以这样应用它:注意这段代码是徒手编写的,甚至没有通过 Python 运行来检查语法上的非法拼写错误。换句话说,您必须对其进行测试。

from scipy import signal
import numpy
THRESHOLD = 3000
# … the rest of your code up to the while-loop:


"""
Your filter: these are the coefficients for the feedforward part (b) and feedback part (a) of an IIR filter. I got these from pyfda.
"""
b = [0.04889, -0.1332, 0.2142, -0.2142, 0.1332, -0.04889]
a = [1,        1.217,  1.996,   1.56,   0.9704,  0.3965 ]

"""
This is yet another filter, a low-pass filter used for smoothing.
Design specs were fs=44100 Hz, f_bp = 60 Hz, f_sb = 100 Hz,
A_bp = 3 dB, A_sb = 30 db, it's also an elliptic IIR.

The 60 Hz cutoff should get rid of anything that's shorter than 
ca 1/60 of a second – we don't care about things shorter than that,
was my assumption.
"""
b_lpf = [0.0005067106574837601, -0.000506612969002681, -0.0005066129690026811, 0.00050671065748376]
a_lpf = [1.0                  , -2.9949235962214953,    2.9899182657208225, -0.9949944741223649]

"""
Initialize the filter state so that it starts "eingeschwungen" (in steady state)
This is optional, but without it, chances are high you always trigger when you first enter the while loop.
"""
filter_state = signal.lfilter_zi(b, a)
filter_state_lpf = signal.lfilter_zi(b_lpf, a_lpf)

left_to_record = 0
wavfile = None
leftover_data = []

while nighttime:
    data=stream.read(CHUNK)
    data_chunk=array('h',data)
    """
    Filter your signal!
    Afterwards, it doesn't contain lower frequencies anymore
    """
    signal_filtered, filter_state = signal.lfilter(b, a, data_chunk, zi=filter_state)

    """
    Convert amplitude to square amplitude, which is proportional to power,
    which is proportional to volume.
    """
    power = numpy.multiply(signal_filtered, signal_filtered)
    
    """
    Now, we'll also need to filter this – but this time with 
    a low-pass filter. We'll use the second filter we've designed:
    """
    power_filtered, filter_state = signal.lfilter(b_lpf, a_lpf, power, zi=filter_state_lpf)

    """
    Fun fact! We've now reduced the *actually occupied spectrum*
    significantly at the output of this rather hefty filter.
    That means of the 22.050 kHz that this signal originally could
    represent (says Nyquist), only maybe 1/200 actually still 
    contain 'change'. I.e. we can now more than safely simply ignore
    99 out of 100 samples - the others are definitely just nice and 
    smoothly interpolated between these.
    Now. That saves *a lot* of calculation. We only need to find the
    maximum on that subset! The real maximum power can't be much
    higher than what we see there.
    """
    maximum_power = max(power_filtered[::100])
    if maximum_power >= THRESHOLD:
        left_to_record = RECORD_SECONDS * RATE // CHUNK * CHUNK
        # I'm **REALLY** not versed with wavfile, and
        # I think you shouldn't be using that, but pysndfile
        # instead to write losslessly compressed (FLAC) audio
        # instead. Anyways, I haven't tried this, so debugging
        # is on you, sorry :)
        if not wavfile:
          timestring = time.strftime("%Y%m%d-%H%M%S")
          filename = f"RECORDING-{timestring}.wav"
          wavfile=wave.open(filename,'wb')
          wavfile.setnchannels(CHANNELS)
          wavfile.setsampwidth(audio.get_sample_size(FORMAT))
          wavfile.setframerate(RATE)
    
    if left_to_record > 0:
        """
        Write available full chunks!
        the // CHUNK * CHUNK trick just rounds down to the
        next multiple of CHUNK
        """
        
        write_now = (len(leftover_data) + len(data)) // CHUNK * CHUNK
        write_now = min(write_now, left_to_record)
        write_from_leftover = min(write_now, len(leftover_data))
        if write_from_leftover:
          wavfile.writeframes(bytes(leftover_data[:write_from_leftover]))
          leftover_data = [write_from_leftover]
          left_to_record -= write_from_leftover
          write_now -= write_from_leftover
        wavfile.writeframes(bytes(data[:write_now]))
        left_to_record -= write_now
        if left_to_record < CHUNK:
          wavfile.close()
          wavfile = None
        else:
          leftover_data = data[:write_now]

明显的错误:

高通和低通滤波器都有我们所说的群延迟所以,触发器是在声音开始之后才出现的。那不好吗?不,我认为延迟时间少于 50 毫秒,您可能会接受。而且,我们实际上并没有在触发时捕捉到 5 秒,而是在帧开始时捕捉到 5 秒,所以平均而言,我们实际上会早一点。

事实上,如果在一个记录时间内出现新的超阈值音量,这实际上是为了“保持记录活动”另外 5 秒。我不确定这是你的意图,但我认为它是。


¹ 设计参数:

  • 高通
  • IIR(因为我们不关心相位或精度,这只是为了检测活动,所以它应该是省力的!IIR 需要更少的计算来获得相同的陡度,但它们不如 FIR 好信号)
  • 椭圆(在快速抑制阻带的能力和仍然具有可靠通带之间的良好权衡)
  • 最小阶数(我们想要通过这些规范获得的“最便宜”的过滤器,而不是我们为计算复杂度支付的“价格”获得的最佳过滤器)
  • 频率单位 kHz, ,fs=44.1
  • fSB(阻带结束的频率)8
  • ASB阻带的最小衰减(以分贝为单位):50
  • fPB(通带开始的频率)11
  • APB我们在通带中允许的纹波,即通带允许的不平坦程度(分贝):2.85
  • fC 11

² 第二个滤波器,低通滤波器,如下所示:

低通滤波器幅度响应