我正在尝试在 MATLAB 中实现一个与此类似的音高移位器,以进行快速原型设计。
我的目标是做相反的事情,但我认为它是相似的。
缺少实现细节。所以我试图将它们拼凑在一起。
看起来相位声码器和 SOLA 方法是首选方法。所以我在那里做了一些研究。
我发现这个带有 MATLAB 代码和一个示例章节来解释它是如何工作的。
我了解大部分代码,除了在相位重建期间,有一个奇怪的选择或 0 取决于 bin 索引是否为奇数。
pizero=pi*ones(1,length(freqind));
pcent=peaksort(tk,2)-peaksort(tk,1)+1;
indpc=(2-mod(pcent,2)):2:length(freqind);
pizero(indpc)=zeros(1,length(indpc));
phout(freqind)=phadvance(sk,peaksort(tk,2))+pizero;
书中的文字描述同样奇怪,参考文献只指向一个数字
If these are chosen to be ... the burst generated by the IFFT will be windowed with tapered ends
我们知道我们为什么要做出这样的选择吗?什么是锥形末端?为什么会导致这些选择呢?为什么锥形末端更可取?
我还找到了这个,它似乎在做相位声码器,但在其他方面有一个相当不同的实现,特别是,我找不到在该代码中完成的任何峰值查找工作。
这让我想知道,什么是相位声码器,是第一个环节,还是第二个环节,还是一类算法?
当作者说音高转换时,实际上,他似乎在进行音高缩放,并且看起来他并不关心某些频率内容是否超出范围,他只是忽略了它们。
/* ***************** PROCESSING ******************* */
/* this does the actual pitch shifting */
memset(gSynMagn, 0, fftFrameSize*sizeof(float));
memset(gSynFreq, 0, fftFrameSize*sizeof(float));
for (k = 0; k <= fftFrameSize2; k++) {
index = k*pitchShift;
if (index <= fftFrameSize2) {
gSynMagn[index] += gAnaMagn[k];
gSynFreq[index] = gAnaFreq[k] * pitchShift;
}
}
如果我将此应用于男性到女性的转变。我们知道男性声音的频率范围是 85 到 180Hz,而女性声音的频率范围是 165 到 255Hz,只是进行缩放可能会产生一些女性声音不存在的 300Hz 部分。
我想知道这是否有意义?
- 将 0 到 85Hz 缩放到 0 到 165Hz。
- 将 85 Hz 缩放到 180Hz 到 165 到 255Hz
- 将 180Hz 缩放到最大值到 255Hz 到最大值。
以便最小限度地改变带宽。
我也想做第一个链接中的共振峰技巧。
共振峰分析似乎主要使用 LPC 或倒谱来完成。我选择了 LPC。
我发现这有助于我使用 LPC 估计共振峰。我可以毫无问题地重现实验。但我想知道
如何在相位声码器中实现共振峰偏移?
对于如此冗长而令人费解的问题,我深表歉意——这对我来说太难了,我才刚刚开始进行音频信号处理。如果我对这一切了解得更清楚,我就可以将它们分别提出清晰的问题,在这一点上,它们在我看来都是一团糟。
免责声明,这不是家庭作业,这只是一个辅助项目学习练习。
更新:
这是我现在拥有的代码。正是基于此。它通常以我想要的方式改变频谱,但它与我想要重现的实验中的音高偏移版本相去甚远。我怀疑我在这里观察到的高频噪声是罪魁祸首。
% This implements a phase vocoder for time-stretching/compression
% Put the file name of the input file in "infile"
% the file to be created is "outfile"
% The amount of time-stretch is determined by the ratio of the hopin
% and hopout variables. For example, hopin=242 and hopout=161.3333
% (integers are not required) increases the tempo by
% hopin/hopout = 1.5. To slow down a comparable amount,
% choose hopin = 161.3333, hopout = 242.
% 5/2005 Bill Sethares
clf
infile='male.wav';
outfile='female.wav';
time=0; % total time to process
hopin=121; % hop length for input
hopout=121; % hop length for output
all2pi=2*pi*(0:100); % all multiples of 2 pi (used in PV-style freq search)
max_peak=50; % parameters for peak finding: number of peaks
eps_peak=0.005; % height of peaks
nfft=2^12; nfft2=nfft/2; % fft length
win=hanning(nfft)'; % windows and windowing variables
% y are the samples, sr is the sampling rate.
[y,sr]=wavread(infile); % read song file
siz=wavread(infile,'size'); % length of song in samples
stmo=min(siz); leny=max(siz); % stereo (stmo=2) or mono (stmo=1)
if siz(2)<siz(1), y=y'; end
if time==0, time = 100000; end
% time is in unit seconds, and therefore time * sr equals to the number
% of samples for time seconds.
% in any case, we will not be dealing with 100,000 seconds anyway.
% it looks like a precaution to avoid very long loop only.
tt=zeros(stmo,ceil(hopout/hopin)*min(leny,time*sr)); % place for output
% Suppose hopin is an integer and we are advancing window by hopin
% every time, the first window is [1..nfft],
% the next window is [hopin + 1..hopin + nfft]
% and so on, so the last window should be
% [k * hopin + 1 .. k * hopin + nfft]
% and k * hopin + nfft should be closest to leny
% That is probably how lenseg is determined.
lenseg=floor((min(leny,time*sr)-nfft)/hopin) % number of nfft segments to process
% The fastest wave in discrete time takes two samples. Therefore, the
% period is 2 * sampling period. The associated frequency is sampling rate / 2
% which is also equal to sr * nfft2/nfft
% the remaining frequencies are scaled linearly.
ssf=sr*(0:nfft2)/nfft; % frequency vector
% ph stands for phase
phold=zeros(stmo,nfft2+1); phadvance=zeros(stmo,nfft2+1);
outbeat=zeros(stmo,nfft); pold1=[]; pold2=[];
% hopin is the number of samples to advance, therefore dtin is the
% amount of time to advance
dtin=hopin/sr; % time advances dt per hop for input
dtout=hopout/sr; % time advances dt per hop for output
% for k=1239:1239
for k=1:lenseg-1 % main loop - process each beat separately
% for k=1:3 % main loop - process each beat separately
if k/1000==round(k/1000), disp(k), end % for display so I know where we are
% These are the indexes of the frame
indin=round(((k-1)*hopin+1):((k-1)*hopin+nfft));
for sk=1:stmo % do L/R channels separately
s=win.*y(sk,indin); % get this frame and take FFT
ffts=fft(s);
mag=abs(ffts(1:nfft2+1));
ph=angle(ffts(1:nfft2+1));
% find peaks to define spectral mapping
% peaks is going to be a matrix with 3 columns, one row for
% each peak, the first column and the last column represents
% the bin for the peak, and the second column is index of the
% peak
peaks=findPeaks4(mag, max_peak, eps_peak, ssf);
% inds(1) is the index of the maximum peak, the mag is a
% vector, not a function
[dummy,inds]=sort(mag(peaks(:,2)));
% peaksort are the peaks, the columns are interpreted just like
% peaks, and the rows are sorted in ascending order
peaksort=peaks(inds,:);
% pc is the second column, that correspond to the indices of
% the peak, sorted in ascending order
pc=peaksort(:,2);
bestf=zeros(size(pc));
for tk=1:length(pc) % estimate frequency using PV strategy
% tk is which peak
% pc(tk) is the index of the peak
% ph(pc(tk)) is the phase of the peak
%
% the phold term means the same thing for the last frame
% MATLAB allows adding a scalar to a vector, it means
% applying to all elements
%
% so dtheta is a set of phase differences
dtheta=(ph(pc(tk))-phold(sk,pc(tk)))+all2pi;
% In a period, the phasor runs 2pi radian
% In time dt, the phasor runs 2 pi dt/T = 2pi dt f radian
% And we know the phase differences are dtheta
% so fest are all the possible frequencies causing that
% peak.
fest=dtheta./(2*pi*dtin); % see pvanalysis.m for same idea
% ssf(pc(tk)) correspond to the frequency of the coefficient.
% This implies we wanted the frequency that is closest
[er,indf]=min(abs(ssf(pc(tk))-fest));
% Now we have the best frequency found for the peak
bestf(tk)=fest(indf); % find best freq estimate for each row
end
% generate output mag and phase
magout=mag; phout=ph;
for tk=1:length(pc)
% We have only one best frequency for a peak
fin=bestf(tk);
% Here are all the input spectral indexes
inbinfrom = peaksort(tk,1);
inbinto = peaksort(tk,3);
freqindin=(inbinfrom:inbinto);
fdes = map_frequency(fin, sr);
inbinfromfreq = inbinfrom * sr/nfft;
inbintofreq = inbinto * sr/nfft;
outbinfromfreq = map_frequency(inbinfromfreq, sr);
outbintofreq = map_frequency(inbintofreq, sr);
outbinfrom = round(outbinfromfreq * nfft / sr);
outbinto = round(outbintofreq * nfft / sr);
freqindout=(outbinfrom:outbinto);
% specify magnitude and phase of each partial
% The output magnitude is interpolated
inputx = (0:length(freqindin)-1)/(length(freqindin));
inputy = mag(freqindin);
outputx = (0:length(freqindout)-1)/(length(freqindout));
outputy = interp1(inputx, inputy, outputx,'linear','extrap');
magout(freqindout)= outputy;
% We know the real frequency, so that phase should increase by
% 2 * pi * fdes * dtout
% We are estimating only a single phadvance for the whole bin
phadvance(sk,peaksort(tk,2))=phadvance(sk,peaksort(tk,2))+2*pi*fdes*dtout;
% These weird expression zero outs some entries in pizero. This
% is a mechanism that is not totally understood around the idea
% of tapered end as in the notes
pizero=pi*ones(1,length(freqindout));
pcent=peaksort(tk,2)-peaksort(tk,1)+1;
indpc=(2-mod(pcent,2)):2:length(freqindout);
pizero(indpc)=zeros(1,length(indpc));
% Now we can set the phase for the bin
phout(freqindout)=phadvance(sk,peaksort(tk,2))+pizero;
end
% reconstruct time signal (stretched or compressed)
compl=magout.*exp(sqrt(-1)*phout);
compl(nfft2+1)=ffts(nfft2+1);
compl=[compl,fliplr(conj(compl(2:(nfft2))))];
wave=real(ifft(compl));
outbeat(sk,:)=wave;
phold(sk,:)=ph;
end % end stereo
indout=round(((k-1)*hopout+1):((k-1)*hopout+nfft));
tt(:,indout)=tt(:,indout)+outbeat;
end
tt=0.8*tt/max(max(abs(tt)));
[rtt,ctt]=size(tt); if rtt==2, tt=tt'; end
wavwrite(tt,sr,16,outfile);
fclose('all');
给出了 findPeaks4 函数,我所做的只是写一些注释来记录我的理解。
function peaks = findPeaks4( Amp, MAX_PEAK, EPS_PEAK, SSF )
% This version modified from findPeaks.m by P. Moller-Nielson
% 28.3.03, pm-n. ( see http://www.daimi.au.dk/~pmn/sound/ )
SPECTRUM_SIZE=length(Amp);
% The very first entry of the spectrum is the DC value, we do not consider
% that. Therefore only [3..SPECTRUM_SIZE-1] can be a local maximum
% The first two comparison make sure it is a local maximum, and the last
% multiplication extract the value of that local maximum.
%
% The resulting vector index 1 correspond to index 3 of the Amp vector (*)
peakAmp = ( Amp(3:SPECTRUM_SIZE-1) > Amp(2:SPECTRUM_SIZE-2) ) .* ...
( Amp(3:SPECTRUM_SIZE-1) > Amp(4:SPECTRUM_SIZE) ) .* ...
Amp(3:SPECTRUM_SIZE-1);
peakPos = zeros( MAX_PEAK, 1);
maxAmp = max( peakAmp );
nPeaks = 0;
for p = 1 : MAX_PEAK
% The maximum value is m, the index of that maximum value is b
[m, b] = max( peakAmp );
if m <= ( EPS_PEAK * maxAmp )
break;
end;
% The + 2 is due to (*)
peakPos(p) = b+2;
% By setting peakAmp(b) to 0, the max in the next iteration will find the
% next max
peakAmp(b) = 0;
nPeaks = p;
end;
peakPos = sort( peakPos );
peaks = zeros( nPeaks, 3 );
% b stands for bin. WTF! Saving just 2 characters to let me guess?
last_b = 1;
for p = 1 : nPeaks
b = peakPos(MAX_PEAK-nPeaks+p);
first_b = last_b+1;
if p == nPeaks
last_b = SPECTRUM_SIZE;
else
next_b = peakPos(MAX_PEAK-nPeaks+p+1);
% Between the peaks, find the minimum
% note that rel_min is the index starting from b
% rel_min = 1 meaning Amp(b) is minimal
[dummy, rel_min] = min( Amp(b:next_b));
% Therefore last_b is the index of the minimum
last_b = b+rel_min-1;
end;
peaks(p,1) = first_b;
peaks(p,2) = b;
peaks(p,3) = last_b;
end;
map_frequency 函数是您所期望的
function fdes = map_frequency(fin, sr)
fmax = sr/2;
if fin < 85
fdes = fin / 85 * 165;
else
if fin < 180
fdes = (fin - 85)/(180-85) * (255 - 165) + 165;
else
fdes = (fin - 180)/(fmax-180) * (fmax - 255) + 255;
end
end