机器算法验证 - 如何在 R 中使用块设计执行 CCA - 吾爱随笔录

如何在 R 中使用块设计执行 CCA

机器算法验证 r 多元分析实验设计对应分析素食主义者

2022-04-11 22:55:24

以下示例和数据是完全捏造的：

假设我正在研究猿的梳理行为。我有四个笼子，每个笼子里有 8 只猿猴（4 只雌性 + 雄性）。在 24 小时内，我进行了一次观察，目的是记录女性为男性梳理毛发时的事件数量。

我想知道，我是否可以用雄性的这 5 个特征（x1、x2、...x5）来解释梳理事件的可变性。[grooming][m_char]

我的方法是执行 CCA（又名规范对应分析），如下所示：

library(vegan)
my.cca <- cca(grooming ~ x1 + x2 + x3 + x4 + x5,
              data = m_char,
              scale = TRUE)

并通过 anova 测试该模型以获得因子的 p 值

anova(my.cca, by="terms", permutations=1000)

# Permutation test for cca under reduced model
# Terms added sequentially (first to last)
# Permutation: free
# Number of permutations: 1000
# 
# Model: cca(formula = grooming ~ x1 + x2 + x3 + x4 + x5,
#            data = m_char, scale = TRUE)
#          Df ChiSquare      F   Pr(>F)   
# x1        1    0.5327 1.1054 0.718282   
# x2        1    0.6216 1.2899 0.359640   
# x3        1    0.6341 1.3159 0.314685   
# x4        1    0.6739 1.3984 0.191808   
# x5        1    0.8782 1.8225 0.008991 **
# Residual 10    4.8188                   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

虽然，x5 似乎对修饰有显着影响，但当我看情节时......

plot(my.cca)

……一个笼子里的猿类似乎紧紧地聚集在一起。

我认为我的数据结构会导致 cca 分析出现问题，因为当您查看具有交互次数的表格时......

...只有黄色方块中的零是“真正的”零（即猿之间没有相互作用），黄色方块之外的其余零只是处于不同笼子的结果。

是否有可能以某种方式告诉 CCA忽略“假零”？如何适当地合并这样的数据结构？我的主要目的是找出雄性的一个或多个特征是否可以用来解释修饰率。

非常感谢您的任何建议！

“修饰”示例数据：

AF1 <- c(3,7,2,0,rep(0, times = 12))
AF2 <- c(8,0,0,3,rep(0, times = 12))
AF3 <- c(0,2,0,0,rep(0, times = 12))
AF4 <- c(4,0,1,0,rep(0, times = 12))
AF4 <- c(4,0,1,0,rep(0, times = 12))
BF1 <- c(rep(0, times = 4),4,1,5,0,rep(0, times = 8))
BF2 <- c(rep(0, times = 4),3,0,0,1,rep(0, times = 8))
BF3 <- c(rep(0, times = 4),0,2,0,0,rep(0, times = 8))
BF4 <- c(rep(0, times = 4),0,0,7,0,rep(0, times = 8))
CF1 <- c(rep(0, times = 8),4,0,4,0,rep(0, times = 4))
CF2 <- c(rep(0, times = 8),2,0,0,0,rep(0, times = 4))
CF3 <- c(rep(0, times = 8),0,3,0,4,rep(0, times = 4))
CF4 <- c(rep(0, times = 8),0,0,9,0,rep(0, times = 4))
DF1 <- c(rep(0, times = 12),0,2,0,0)
DF2 <- c(rep(0, times = 12),6,0,0,1)
DF3 <- c(rep(0, times = 12),0,0,1,0)
DF4 <- c(rep(0, times = 12),4,0,1,0)

male_id <- c("AM1", "AM2", "AM3", "AM4",
             "BM1", "BM2", "BM3", "BM4",
             "CM1", "CM2", "CM3", "CM4",
             "DM1", "DM2", "DM3", "DM4")

grooming <- data.frame(AF1,AF2,AF3,AF4,
                       BF1,BF2,BF3,BF4,
                       CF1,CF2,CF3,CF4,
                       DF1,DF2,DF3,DF4)
rownames(grooming) <- male_id

“m_char”示例数据：

x1 <- c(2,5,3,1,8,1,1,6,2,3,5,1,1,6,6,7)
x2 <- c(4,9,1,3,2,1,9,4,3,1,9,2,9,1,4,3)
x3 <- c(1,2,5,4,4,4,5,2,2,1,1,1,2,5,5,9)
x4 <- c(4,1,1,8,8,6,6,6,6,8,1,6,2,2,1,1)
x5 <- c(1,7,3,3,4,1,5,1,3,8,4,8,8,5,9,9)

m_char <- data.frame(x1, x2, x3, x4, x5)
rownames(m_char) <- male_id

1个回答

我认为问题在于您使用的置换测试过于宽松；它假设一个零假设，其中所有观察结果都是可交换的。从您所说的情况来看，笼子内的观察是可以交换的，但笼子之间是不可互换的。

要在置换测试中使用更具限制性的零值，我们可以使用受限置换设计，在该设计中，我们要求vegan仅在笼子内置换观察结果，而从不在笼子之间置换。使用阻塞因子最容易做到这一点。

创建阻塞因子：

m_char <- transform(m_char,
                    cage = factor(substring(rownames(m_char), 1, 1)))

接下来，定义受限排列设计（平均值的默认值我们在传递给的因子水平内how()获得随机化）blocks

ctrl <- how(nperm = 1000, blocks = m_char$cage)

现在拟合模型并移除笼间变异，这也是这些块分析所必需的

my.cca <- cca(grooming ~ x1 + x2 + x3 + x4 + x5 + Condition(cage),
              data = m_char)

现在进行受限置换测试

set.seed(10)
anova(my.cca, by = "terms", permutations = ctrl)

这会产生：

> anova(my.cca, by = "terms", permutations = ctrl)
Permutation test for cca under reduced model
Terms added sequentially (first to last)
Blocks:  m_char$cage 
Permutation: free
Number of permutations: 1000

Model: cca(formula = grooming ~ x1 + x2 + x3 + x4 + x5 + Condition(cage), data = m_char)
         Df ChiSquare      F  Pr(>F)  
x1        1   0.52470 1.4119 0.15185  
x2        1   0.60901 1.6388 0.03397 *
x3        1   0.21703 0.5840 0.96503  
x4        1   0.36739 0.9886 0.36563  
x5        1   0.83984 2.2600 0.03497 *
Residual  7   2.60129                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

三图显示与您获得的不同的行为

现在这有意义吗？

其它你可能感兴趣的问题