机器算法验证 - 从 PCA 中提取最重要的变量 - 吾爱随笔录

我想从 PCA 结果中获得最重要的变量。我在图中看到两个集群。我现在可能不是只有一个变量导致了这种情况，所以也许我必须得到多个变量。

我正在使用“ Adegenet ”R 包。我的原始数据是一个矩阵，其中行是 PubMed 论文，列是 MeSH 关键字。数据已被转换为类 SNP，以使该方法适应新的输入数据。如果您认为我对这个包的工作不正确，请指出正确的 R 包，我之所以选择它是因为我已经知道它是如何工作的。

基于 1359 篇宏基因组论文和 3459 个 MeSH 术语的 PCA 前两个维度

#R code
library(adegenet)
#Load SNPs
myPath <- "pubmed_result_metagenomics_ALL_parsed.fasta"#core SNPs retrieved with kSNP from 188 H. parasuis strains, removing from the analysis the strains tagged with ‘NK’ phenotype. kSNP k-mer sizes tested were 25, 20 and 15, selecting the run that gave more SNPs, i.e., 15.
core_SNPs_matrix <- fasta2genlight(myPath, chunk=1000, multicore=FALSE)#
core_SNPs_matrix <- as.matrix(core_SNPs_matrix)

# Principal Component Analysis (PCA)
pca1 <- glPca(core_SNPs_matrix) # 10 components saved
pca1

# Draw PCA colorplot
myCol <- colorplot(pca1$scores,pca1$scores, transp=TRUE, cex=4)
abline(h=0,v=0, col="grey")
add.scatter.eig(pca1$eig[1:40],2,1,2, posi="topright", inset=.05, ratio=.3)
title("First two dimensions of PCA \n based on 1359 metagenomcs papers \n and 3459 MeSH terms")
dev.copy2pdf(file = "Figure_12.pdf") #Save as .pdf#