对数据集进行归一化,使用 PCA 降维,然后对降维数据进行归一化是否有效。假设这是在训练数据上执行的,是否应该使用相同的 PCA 系数来减少测试数据的维度。测试和训练数据是否应该使用相同的最大和最小归一化值。我已经包含了我正在使用的代码的简化示例,它可以描述我说得更好。提前致谢。
%% Prepare Training Data
% Normalise training data
mindata=min(TRAINDATA); maxdata=max(TRAINDATA);
TRAINDATA = ((TRAINDATA-repmat(mindata,[size(TRAINDATA,1),1]))./(repmat(maxdata,[size(TRAINDATA,1),1])-repmat(mindata,[size(TRAINDATA,1),1])) - 0.5 ) *2;
% Perform PCA
mTRAINDATA = mean(mean(TRAINDATA));
TRAINDATA = TRAINDATA - mTRAINDATA;
[Cpca,~,~,~,~]=princomp(TRAINDATA,'econ');
EigenRange = 1:2;
Cpca = Cpca(:,EigenRange);
TRAINDATA = TRAINDATA*Cpca;
TRAINDATA = TRAINDATA + mTRAINDATA;
% Normalise training data second time
mindata2=min(TRAINDATA); maxdata2=max(TRAINDATA);
TRAINDATA = ((TRAINDATA-repmat(mindata2,[size(TRAINDATA,1),1]))./(repmat(maxdata2,[size(TRAINDATA,1),1])-repmat(mindata2,[size(TRAINDATA,1),1])) - 0.5 ) *2;
%% Prepare Test Data
% Normalise using first normalisation values from training data
TESTDATA = ((TESTDATA-repmat(mindata,[size(TESTDATA,1),1]))./(repmat(maxdata,[size(TESTDATA,1),1])-repmat(mindata,[size(TESTDATA,1),1])) - 0.5 ) *2;
% Perform PCA
mTESTDATA = mean(mean(TESTDATA));
TESTDATA = TESTDATA - mTESTDATA;
TESTDATA = TESTDATA*Cpca;
TESTDATA = TESTDATA + mTRAINDATA;
% Normalise using second normalisation values from training data
TESTDATA = ((TESTDATA-repmat(mindata2,[size(TESTDATA,1),1]))./(repmat(maxdata2,[size(TESTDATA,1),1])-repmat(mindata2,[size(TESTDATA,1),1])) - 0.5 ) *2;