信息处理 - 如何识别录音的有用元素并忽略间歇？ - 吾爱随笔录

如何识别录音的有用元素并忽略间歇？

信息处理声音的信号分析 Python

2022-01-26 11:41:11

我认为这是一个相对简单的 DSP 任务，但我无法找到有关如何解决我的问题的任何信息。希望有更多经验和脑力的人可以帮助我：

我有很多（数千条）鸟叫的音频记录。我正在寻找一种技术来识别鸟鸣叫的每个音频文件中的时间戳。这是一个高质量音频文件的示例：

这是一个干净的音频文件示例，背景噪音很小

最终，我希望导出每个文件中鸟类创建的频率表，以供进一步分析。我已经创建了一个 Python 程序，它允许我手动选择感兴趣的数据点，但我想自动化这个过程。

我能想到的最简单的过程是为感兴趣的数据点设置最小功率阈值，但音频文件具有不同程度的背景噪声。由于背景噪音，某些音频文件可能无法使用，这很好，但需要将它们标记为这样。

所以这两个问题是：
1）如何识别低质量的音频文件
2）如何识别高质量音频文件的有用部分

谢谢！

2个回答

编辑：我不知道为什么我认为这个问题是基于 matlab 的，但我的解决方案是用 matlab 编写的。快速搜索显示我的 python scikit有 otsu 的自动阈值方法（这是 matlab 使用的）。除此之外，我认为大多数代码都是相对 Python 安全的，并且可以轻松翻译。我为我的错误道歉

from skimage import filter
val = filter.threshold_otsu(camera)

结束编辑

你能限制录音的幅度吗？说这是我们的录音

在此处输入图像描述

我们正在绘制的地方

plot(abs(sound))

我们可以说啁啾声可能高于 2 吗？并假设低于幅度= 2 的一切都只是噪音？如果是这样，并且假设您有图像处理工具箱，您可以

标准化下一步所需的音频文件幅度（介于 0-1 之间）
获取文件图像处理工具箱功能的自动阈值
找到超过阈值的指标并说这些是鸟鸣

唯一的问题是声音即使有很强的信号，它通常会越过 0，所以即使在实际的鸟鸣声中也会出现不连续性。也许您可以在屏蔽操作中添加某种滞后。就像 如果当前样本低于阈值但前几个样本中有很大一部分高于阈值，我们将假设该样本也是一个兴趣点

大致你的代码会是这样的（我不靠近matlab的comp，所以这可能有轻微的语法错误）。这也不是很优化，但也许它是你或其他人的跳板

%% step 1
%we are using magnitude only
my_min = min(abs(soundfile));
my_max = max(abs(soundfile));

%gets sound file magnitude between 0-1 only
normalized_sound_mag = 1/(my_max-my_min) * (abs(soundfile) - my_min);

%% step 2
%get a threshold, we needed the magnitude between 0-1 for this function to work
sound_thresh = graythresh(normalized_sound_mag);

%% step 3
%this vector will store all sample indexes that are interesting
idx_sound = 0;

num_for_hysteresis = 10;        %the number of samples to use for hysteresis
hysteresis_percentage = 0.8;    %percentage of samples that must be above threshold
                                %so say in our set of 10, 2 samples are below threshold
                                %since .8 are above it, we say they are all of interest

%because the way we implement hysteresis we have to skip the first few samples
for (i=num_for_hysteresis:1:length(normalized_sound_mag))
    %if we are above thresh, without a doubt save the index
    if (normalized_sound_mag(i) > sound_thresh)
        idx_sound = [idx_sound, i];
    else
        %hysteresis check prev samples, creates a boolean vector. 1 means value above thresh, 0 means it was below
        samples_above_thresh = normalized_sound_mag(i-num_for_hysteresis:i) > sound_thresh;

        %nnz, gets the number of nonzero elements in a matrix
        num_prev_samples_above_thresh = nnz(samples_above_thresh);

        %if the number of samples in the prev window met our criteria, this current sample 
        %is probably of interest as well
        if (num_prev_samples_above_thresh > hysteresis_percentage * num_for_hysteresis)
            idx_sound = [idx_sound, i];
        end
    end
end

%idx_sound should now have all the indicies f interest, these can now be used 
%directly on the original soundclip

我们也可以使用频率功率和阈值，而不是声音幅度，基本轮廓还是一样的。

看起来好文件包含一定范围的频率功率，而如果它们缺少您建议的范围，则可能会标记低质量

要识别有趣的时间点，只需创建一个时间窗口，该时间窗口从每个频率功率峰值之前的几分钟开始，在这样的峰值之后持续一段时间

动物知道会重复它们的啁啾声，因此如果它包含具有相似特征的重复频率功率尖峰，您可以提高看到高质量文件的信心

其它你可能感兴趣的问题

上一篇如何找出图像中是否存在高斯噪声或椒盐噪声，是否有任何算法下一篇均衡1 / f1/f噪音