编辑:我不知道为什么我认为这个问题是基于 matlab 的,但我的解决方案是用 matlab 编写的。快速搜索显示我的 python scikit有 otsu 的自动阈值方法(这是 matlab 使用的)。除此之外,我认为大多数代码都是相对 Python 安全的,并且可以轻松翻译。我为我的错误道歉
from skimage import filter
val = filter.threshold_otsu(camera)
结束编辑
你能限制录音的幅度吗?说这是我们的录音
我们正在绘制的地方
plot(abs(sound))
我们可以说啁啾声可能高于 2 吗?并假设低于幅度= 2 的一切都只是噪音?如果是这样,并且假设您有图像处理工具箱,您可以
- 标准化下一步所需的音频文件幅度(介于 0-1 之间)
- 获取文件图像处理工具箱功能的自动阈值
- 找到超过阈值的指标并说这些是鸟鸣
唯一的问题是声音即使有很强的信号,它通常会越过 0,所以即使在实际的鸟鸣声中也会出现不连续性。也许您可以在屏蔽操作中添加某种滞后。就像
如果当前样本低于阈值但前几个样本中有很大一部分高于阈值,我们将假设该样本也是一个兴趣点
大致你的代码会是这样的(我不靠近matlab的comp,所以这可能有轻微的语法错误)。这也不是很优化,但也许它是你或其他人的跳板
%% step 1
%we are using magnitude only
my_min = min(abs(soundfile));
my_max = max(abs(soundfile));
%gets sound file magnitude between 0-1 only
normalized_sound_mag = 1/(my_max-my_min) * (abs(soundfile) - my_min);
%% step 2
%get a threshold, we needed the magnitude between 0-1 for this function to work
sound_thresh = graythresh(normalized_sound_mag);
%% step 3
%this vector will store all sample indexes that are interesting
idx_sound = 0;
num_for_hysteresis = 10; %the number of samples to use for hysteresis
hysteresis_percentage = 0.8; %percentage of samples that must be above threshold
%so say in our set of 10, 2 samples are below threshold
%since .8 are above it, we say they are all of interest
%because the way we implement hysteresis we have to skip the first few samples
for (i=num_for_hysteresis:1:length(normalized_sound_mag))
%if we are above thresh, without a doubt save the index
if (normalized_sound_mag(i) > sound_thresh)
idx_sound = [idx_sound, i];
else
%hysteresis check prev samples, creates a boolean vector. 1 means value above thresh, 0 means it was below
samples_above_thresh = normalized_sound_mag(i-num_for_hysteresis:i) > sound_thresh;
%nnz, gets the number of nonzero elements in a matrix
num_prev_samples_above_thresh = nnz(samples_above_thresh);
%if the number of samples in the prev window met our criteria, this current sample
%is probably of interest as well
if (num_prev_samples_above_thresh > hysteresis_percentage * num_for_hysteresis)
idx_sound = [idx_sound, i];
end
end
end
%idx_sound should now have all the indicies f interest, these can now be used
%directly on the original soundclip
我们也可以使用频率功率和阈值,而不是声音幅度,基本轮廓还是一样的。