信息处理 - 用文字描述的逆短时傅里叶变换算法 - 吾爱随笔录

用文字描述的逆短时傅里叶变换算法

信息处理傅里叶变换算法 stft

2021-12-28 00:16:32

我试图从概念上理解当正向和反向短时傅里叶变换 (STFT) 应用于离散时域信号时会发生什么。我找到了 Allen 和 Rabiner 的经典论文（1977 年），以及 Wikipedia 文章（链接）。我相信这里还可以找到另一篇好文章。

我对计算 Gabor 变换很感兴趣，它只不过是带有高斯窗口的 STFT。

这就是我对前向STFT 的理解：

从信号中选择一个由时域元素组成的子序列。
在时域中使用逐点乘法将子序列乘以窗函数。
使用 FFT 将相乘的子序列带入频域。
通过选择连续的重叠子序列，并重复上述过程，我们得到一个m行n列的矩阵。每列是在给定时间计算的子序列。这可用于计算频谱图。

然而，对于逆STFT，论文讨论了对重叠分析部分的求和。我发现想象这里真正发生的事情非常具有挑战性。我必须做什么才能计算逆STFT（按上述逐步顺序）？

正向 STFT

我已经创建了一张图，展示了我认为前向 STFT 正在发生的事情。我不明白的是如何组装每个子序列，以便我恢复原始时间序列。有人可以修改此图或给出一个显示如何添加子序列的方程式吗？正向变换

逆变换

这是我对逆变换的理解。使用 IFFT 将每个连续窗口带回时域。然后每个窗口按步长移动，并添加到前一次移动的结果中。下图显示了这个过程。总和输出是时域信号。

逆变换

代码示例

以下 Matlab 代码生成合成时域信号，然后测试 STFT 过程，证明逆是正向变换的对偶，在数值舍入误差内。信号的开始和结束是零填充的，以确保窗口的中心可以位于时域信号的第一个和最后一个元素。

请注意，根据 Allen 和 Rabiner (1977)，如果在频域中发生乘法以改变频率响应，则分析窗口的长度必须等于或大于个点，其中是滤波器响应. 长度通过零填充扩展。测试代码简单地表明逆是正向变换的对偶。必须延长长度以防止循环卷积。 $N + N_0 - 1$ $N_0$

% The code computes the STFT (Gabor transform) with step size = 1
% This is most useful when modifications of the signal is required in
% the frequency domain

% The Gabor transform is a STFT with a Gaussian window (w_t in the code)

% written by Nicholas Kinar

% Reference:
% [1] J. B. Allen and L. R. Rabiner, 
% “A unified approach to short-time Fourier analysis and synthesis,” 
% Proceedings of the IEEE, vol. 65, no. 11, pp. 1558 – 1564, Nov. 1977.

% generate the signal
mm = 8192;                  % signal points
t = linspace(0,1,mm);       % time axis

dt = t(2) - t(1);           % timestep t
wSize = 101;                % window size


% generate time-domain test function
% See pg. 156
% J. S. Walker, A Primer on Wavelets and Their Scientific Applications, 
% 2nd ed., Updated and fully rev. Boca Raton: Chapman & Hall/CRC, 2008.
% http://www.uwec.edu/walkerjs/primer/Ch5extract.pdf
term1 = exp(-400 .* (t - 0.2).^2);
term2 = sin(1024 .* pi .* t);
term3 = exp(-400.*(t- 0.5).^2);
term4 = cos(2048 .* pi .* t);
term5 = exp(-400 .* (t-0.7).^2);
term6 = sin(512.*pi.*t) - cos(3072.*pi.*t);
u = term1.*term2  + term3.*term4 + term5.*term6; % time domain signal
u = u';

figure;
plot(u)

Nmid = (wSize - 1) / 2 + 1;    % midway point in the window
hN = Nmid - 1;                 % number on each side of center point       


% stores the output of the Gabor transform in the frequency domain
% each column is the FFT output
Umat = zeros(wSize, mm);     


% generate the Gaussian window 
% [1] Y. Wang, Seismic inverse Q filtering. Blackwell Pub., 2008.
% pg. 123.
T = dt * hN;                    % half-width
sp = linspace(dt, T, hN); 
targ = [-sp(end:-1:1) 0 sp];    % this is t - tau
term1 = -((2 .* targ) ./ T).^2;
term2 = exp(term1);
term3 = 2 / (T * sqrt(pi));
w_t = term3 .* term2;
wt_sum = sum ( w_t ); % sum of the wavelet


% sliding window code
% NOTE that the beginning and end of the sequence
% are padded with zeros 
for Ntau = 1:mm

    % case #1: pad the beginning with zeros
    if( Ntau <= Nmid )
        diff = Nmid - Ntau;
        u_sub = [zeros(diff,1); u(1:hN+Ntau)];
    end

    % case #2: simply extract the window in the middle
    if (Ntau < mm-hN+1 && Ntau > Nmid)
        u_sub = u(Ntau-hN:Ntau+hN);
    end

    % case #3: less than the end
    if(Ntau >= mm-hN+1)
        diff = mm - Ntau;
        adiff = hN - diff;
        u_sub = [ u(Ntau-hN:Ntau+diff);  zeros(adiff,1)]; 
    end   

    % windowed trace segment
    % multiplication in time domain with
    % Gaussian window  function
    u_tau_omega = u_sub .* w_t';

    % segment in Fourier domain
    % NOTE that this must be padded to prevent
    % circular convolution if some sort of multiplication
    % occurs in the frequency domain
    U = fft( u_tau_omega );

    % make an assignment to each trace
    % in the output matrix
    Umat(:,Ntau) = U;

end

% By here, Umat contains the STFT (Gabor transform)

% Notice how the Fourier transform is symmetrical 
% (we only need the first N/2+1
% points, but I've plotted the full transform here
figure;
imagesc( (abs(Umat)).^2 )


% now let's try to get back the original signal from the transformed
% signal

% use IFFT on matrix along the cols
us = zeros(wSize,mm);
for i = 1:mm 
    us(:,i) = ifft(Umat(:,i));
end

figure;
imagesc( us );

% create a vector that is the same size as the original signal,
% but allows for the zero padding at the beginning and the end of the time
% domain sequence
Nuu = hN + mm + hN;
uu = zeros(1, Nuu);

% add each one of the windows to each other, progressively shifting the
% sequence forward 
cc = 1; 
for i = 1:mm
   uu(cc:cc+wSize-1) = us(:,i) + uu(cc:cc+wSize-1)';
   cc = cc + 1;
end

% trim the beginning and end of uu 
% NOTE that this could probably be done in a more efficient manner
% but it is easiest to do here

% Divide by the sum of the window 
% see Equation 4.4 of paper by Allen and Rabiner (1977)
% We don't need to divide by L, the FFT transform size since 
% Matlab has already taken care of it 
uu2 = uu(hN+1:end-hN) ./ (wt_sum); 

figure;
plot(uu2)

% Compare the differences bewteen the original and the reconstructed
% signals.  There will be some small difference due to round-off error
% since floating point numbers are not exact
dd = u - uu2';

figure;
plot(dd);

2个回答

STFT 变换对可以通过 4 个不同的参数来表征：

FFT 尺寸 (N)
步长 (M)
分析窗口（尺寸 N）
合成窗口（尺寸 N）

过程如下：

从当前输入位置抓取 N 个（fft 大小）样本
应用分析窗口
做 FFT
在频域做任何你想做的事
逆 FFT
应用合成窗口
在当前输出位置添加到输出
将输入和输出位置提前 M（步长）样本

重叠添加算法就是一个很好的例子。在这种情况下，步长为 N，FFT 大小为 2*N，分析窗口为矩形，N 个 1 后跟 N 个 0，合成窗口全为 1。

为此还有很多其他选择，并且在某些条件下，正向/反向传输正在完全重建（即您可以取回原始信号）。

这里的关键点是，每个输出样本通常会从多个逆 FFT 中接收附加贡献。输出需要在多个帧上累加。贡献帧的数量简单地由 FFT 大小除以步长大小（如有必要，向上取整）给出。

在第一次提出这个问题七年后，我遇到了类似于@Nicholas Kinar 的困惑。在这里，我想提供一些“非官方”和“正确性不能完全保证”的个人感性想法和解释。

为了便于理解，以下陈述的标题被夸大了。

STFT 的正向过程并不真正意味着保留原始信号。
- 当使用具有非平凡窗口（不是全一）的 STFT 时，FFT 的输入信号是原始信号片段的倾斜/拉伸版本。
- 这有利于特征提取，其中无用/冗余数据被过滤掉。与音节检测一样，并非所有时间数据都需要检测语音中的某些特定音调。
- 窗口向量中的峰值代表音频信号中算法应该注意的少数位置。
因此，逆 STFT 的原始结果可能是我们可能无法直观预期的结果。
- STFT 特征的 ifft 应该是窗口化的信号片段。
为了获得原始的无窗口信号片段，可以将逆窗口应用于 ifft 的原始输出。
- 很容易设计一个可以撤销汉明窗效应的映射函数。
然后使用合成窗口来处理时间碎片重叠
- 由于可以将原始未加窗信号片段视为已经获得，因此可以使用任何“转换权重”来内插重叠部分。
如果您想考虑窗口语音的 fft 可能不太尊重弱信号而喜欢那些强大的信号，那么可能有一种方法可以设计相应的合成窗口。
此外，可以通过应用以下原则给出直接的合成窗口生成算法：
- 如果该位置的分析窗口值较高，则与与该位置重叠的其他片段相比，该位置的权重更高。
- 如果该位置的分析窗口值较低，则权重降低合成窗口中的位置，并且其他重叠片段更多地使用较大的分析窗口值来尊重该位置。

其它你可能感兴趣的问题

上一篇当它们的边缘相互接触时，如何检测不同的对象？下一篇图像处理上下文中相关与卷积的区别