应用 PCA 之前的一种常见技术是从样本中减去平均值。如果您不这样做,则第一个特征向量将是均值。我不确定你是否做过,但让我谈谈。如果我们用 MATLAB 代码说话:这是
clear, clf
clc
%% Let us draw a line
scale = 1;
x = scale .* (1:0.25:5);
y = 1/2*x + 1;
%% and add some noise
y = y + rand(size(y));
%% plot and see
subplot(1,2,1), plot(x, y, '*k')
axis equal
%% Put the data in columns and see what SVD gives
A = [x;y];
[U, S, V] = svd(A);
hold on
plot([mean(x)-U(1,1)*S(1,1) mean(x)+U(1,1)*S(1,1)], ...
[mean(y)-U(2,1)*S(1,1) mean(y)+U(2,1)*S(1,1)], ...
':k');
plot([mean(x)-U(1,2)*S(2,2) mean(x)+U(1,2)*S(2,2)], ...
[mean(y)-U(2,2)*S(2,2) mean(y)+U(2,2)*S(2,2)], ...
'-.k');
title('The left singular vectors found directly')
%% Now, subtract the mean and see its effect
A(1,:) = A(1,:) - mean(A(1,:));
A(2,:) = A(2,:) - mean(A(2,:));
[U, S, V] = svd(A);
subplot(1,2,2)
plot(x, y, '*k')
axis equal
hold on
plot([mean(x)-U(1,1)*S(1,1) mean(x)+U(1,1)*S(1,1)], ...
[mean(y)-U(2,1)*S(1,1) mean(y)+U(2,1)*S(1,1)], ...
':k');
plot([mean(x)-U(1,2)*S(2,2) mean(x)+U(1,2)*S(2,2)], ...
[mean(y)-U(2,2)*S(2,2) mean(y)+U(2,2)*S(2,2)], ...
'-.k');
title('The left singular vectors found after subtracting mean')
从图中可以看出,如果你想更好地分析(协)方差,我认为你应该从数据中减去平均值。那么这些值将不在 10-100 和 0.1-1 之间,但它们的平均值都将为零。方差将作为特征值(或奇异值的平方)找到。当我们减去平均值的情况与不减去平均值的情况一样,找到的特征向量不受维度尺度的影响。例如,我已经测试并观察了以下内容,告诉您减去平均值可能对您的情况很重要。所以问题可能不是由方差引起的,而是由翻译差异引起的。
% scale = 0.5, without subtracting mean
U =
-0.5504 -0.8349
-0.8349 0.5504
% scale = 0.5, with subtracting mean
U =
-0.8311 -0.5561
-0.5561 0.8311
% scale = 1, without subtracting mean
U =
-0.7327 -0.6806
-0.6806 0.7327
% scale = 1, with subtracting mean
U =
-0.8464 -0.5325
-0.5325 0.8464
% scale = 100, without subtracting mean
U =
-0.8930 -0.4501
-0.4501 0.8930
% scale = 100, with subtracting mean
U =
-0.8943 -0.4474
-0.4474 0.8943
