计算科学 - 主成分分析没有像我预期的那样表现 - 吾爱随笔录

我有很多积分 $\mathbb{R}^3$ 我想平移和旋转，以使它们的中心位于原点并且沿 $x$ 和 $y$ 轴是最大的（贪婪，并按此顺序）。为此，我尝试使用 python 的主成分分析算法。它的行为不像我预期的那样，很可能是由于我对 PCA 实际所做的事情有一些误解。

问题：当我居中然后旋转数据时，沿第三个分量的方差大于沿第二个分量的方差。这意味着，一旦居中和旋转，沿 $z$ 轴比沿 $y$ . 换句话说，旋转不是正确的。

我在做什么：Python 的 PCA 例程返回一个具有多个属性的对象（比如 myPCA）。myPCA.Y 是数据数组，但居中、缩放和旋转（按此顺序）。我不希望数据被缩放。我只是想要一个平移和一个旋转。

import numpy as np                                                              
from matplotlib.mlab import PCA                                                 

# manufactured data producing the problem                                       
data_raw  = np.array([                                                          
                     [80.0, 50.0, 30.0],                                        
                     [50.0, 90.0, 60.0],                                        
                     [70.0, 20.0, 40.0],                                        
                     [60.0, 30.0, 45.0],                                        
                     [45.0, 60.0, 20.0]                                         
                     ])                                                         

# obtain the PCA                                                                
myPCA = PCA(data_raw)                                                           

# center the raw data                                                           
centered = np.array([point - myPCA.mu for point in data_raw])                   
# rotate the centered data                                                      
centered_and_rotated = np.array([np.dot(myPCA.Wt, point) for point in centered])
# the variance along axis 0 should now be greater than along 1, so on           
variances = np.array([np.var(centered_and_rotated[:,i]) for i in range(3)])     
# they are not:                                                                 
print(variances[1]>variances[2]) #False; I want this to be True                 

# Now look at the PCA output, Y. This is centered, scaled, and rotated.          
# The variances decrease in magnitude, as I want them to:                       
variances2 = np.array([np.var(myPCA.Y[:,i]) for i in range(3)])                 
# This looks good, but the coordinates have been scaled.      
# Let's try to get from the raw coordinates to the PCA output Y
# mu is the vector of means of the raw data, and sigma is the vector of 
# standard deviations of the raw data along each coordinate direction             
guess = np.array([np.dot(myPCA.Wt, (xxx-myPCA.mu)/myPCA.sigma) for xxx in data_raw])
print(guess==myPCA.Y) # all true

上面的最后两行表明我们可以取一个点 $\mathbf{x}$ 从它在原始数据输入中的表示到它的表示 $\mathbf{x}'$ 就 PCA 轴而言，通过

x^{'} = R \cdot ((x - μ) / σ)

$\mathbf{x}' = \mathrm{R}\cdot\left((\mathbf{x}-\boldsymbol{\mu}) / \boldsymbol{\sigma} \right)$

在哪里 $\mathrm{R}$ 是 myPCA.Wt，权重矩阵， $\boldsymbol{\mu}$ 是沿每个坐标轴的原始数据的均值向量， $\boldsymbol{\sigma}$ 是原始数据沿每个坐标轴的标准差向量，划分是逐元素的。为了用标准的数学符号来写这个，让我们用乘法代替这个除法：

x^{'} = R \cdot (D \cdot (x - μ))

$\mathbf{x}' = \mathrm{R}\cdot\left(\mathrm{D}\cdot(\mathbf{x}-\boldsymbol{\mu}) \right)$ 在哪里

D

$\mathrm{D}$ 是一个对角矩阵，其对角元素是

1 / σ_{i}

$1/\sigma_i$ .

这个符号清楚地说明了问题：要撤消缩放，我需要对上面的 RHS 采取行动 $\mathrm{R}\mathrm{D}^{-1}\mathrm{R}^{-1}$ . 这将使我回到问题情况，其中方差沿 $z$ 轴比 $y$ .

有没有办法使用 PCA 来获得我想要的，或者我需要使用其他方法吗？