我正在尝试基于个人的共现来构建(无向)社交网络。稍后将在此网络上应用聚类算法以找到一些不同的子组。问题是所研究的动物物种的寿命很短(或者由于捕食者而导致的死亡率很高)。这导致并非我的网络中的所有关系都可能同时存在。如果你看下图,“红色”个体在 3-4 年后几乎灭绝*,但他们有“最长”的时间“遇见”其他个体,而“蓝色”个体只有两年时间“遇见” “其他人。

从理论上讲,我可以假设每个人的预期寿命都不到 10 年。因此,在标记后 5 或 6 年没有捕获“红色”个体并不一定意味着它已经死亡。
如何将这种时间效应纳入社交网络?
我想回答的具体问题: 第一个问题:观察到的社交联系是否与仅通过共享空间使用解释的联系不同?即,如何测试关联是随机的还是首选的?
如果第一个问题的答案是个人之间的关联不是随机的,那么我有第二个问题......
社会结构是否与遗传相关性相关?即,密切相关的个人更经常在一起吗?(所有个体的 DNA 图谱如下)
在这里,我创建了一些结构类似于我的数据库的数据:
data <- data.frame(obs_date = c("C1","C2","C3","C4","C5","C6","C1","C2",
"C3","C4","C1","C2","C3","C1","C2","C3",
"C4","C5","C6","C7","C1","C3","C4","C5",
"C6","C7","C8","C3","C4","C5","C6","C7",
"C3","C4","C5","C6","C3","C4","C5","C3",
"C4","C5","C6","C5","C6","C7","C8","C5",
"C5","C6","C7","C8","C5","C6","C7","C7",
"C7","C8","C7","C8","C7","C8","C7","C8"),
ind_id = rep(LETTERS[1:20], times = c(6,4,3,7,1,6,5,4,
3,2,2,4,1,4,3,1,2,2,2,2)),
obs = rep(c("seen","not_seen","seen","not_seen","seen",
"not_seen","seen","not_seen","seen"),
times = c(3,1,4,1,9,1,9,3,33)))
在这里,我添加了遗传结构。数据完全是捏造的,但它们应该反映同色个体之间密切的遗传相关性。此外, “紫色”个体是“蓝色”的后代,“蓝色”是“绿色”的后代,“绿色”是“红色”的后代。
gen.raw <- matrix(c("a","g","g","g","c","g","a","a","g","g","g","g","t","c","t","c","t","t","a","a","t","t","a","a",
"a","g","g","g","c","g","a","a","g","g","g","g","c","c","t","c","t","t","a","a","t","c","a","a",
"a","g","g","g","c","g","g","a","g","g","g","g","c","c","t","t","c","t","a","a","t","c","a","a",
"a","g","t","t","t","g","g","a","g","g","g","g","c","c","t","t","c","t","a","a","a","c","a","a",
"a","g","t","t","t","g","g","a","g","g","g","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","g","a","g","g","g","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","g","g","g","g","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","a","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","t","t","g","a","c","g","t","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","c","t","g","a","c","g","g","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","c","t","g","a","c","g","g","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","c","t","g","a","c","g","g","c","g","c","c","t","t","c","t","t","a","t","c","a","a",
"a","g","t","c","t","g","a","c","g","c","c","g","t","c","t","t","c","t","t","a","t","c","a","a"),
byrow = TRUE, ncol = 24)
rownames(gen.raw) <- LETTERS[1:20]
好的,上面给出了源数据。现在我将创建两个距离矩阵。首先是从由OR-SP 索引表示的共现数据派生的关联矩阵。观察到的栖息共享比例是通过将两个人一起被发现的天数除以他们可能在一起的所有可能天数来计算的(在两个人的第一次和最后一次记录之间重叠)。
# matrix of days roosting together
EG <- expand.grid(unique(data$ind_id), unique(data$ind_id))
data_seen <- subset(data, obs == "seen")
my.length.dt <- numeric(nrow(EG))
for (i in 1:nrow(EG)) {
my.length.dt[i] <- length(intersect(as.vector(data_seen$obs_date[data_seen$ind_id == EG[i, 1]]),
as.vector(data_seen$obs_date[data_seen$ind_id == EG[i, 2]])))
days.together <- matrix(my.length.dt, byrow = TRUE, ncol = length(unique(data$ind_id)))
colnames(days.together) <- rownames(days.together) <- unique(data$ind_id)
}
days.together
# matrix of all possible potentional roosting days
EG <- expand.grid(unique(data$ind_id), unique(data$ind_id))
my.length.rdp <- numeric(nrow(EG))
for (i in 1:nrow(EG)) {
my.length.rdp[i] <- length(intersect(as.vector(data$obs_date[data$ind_id == EG[i, 1]]),
as.vector(data$obs_date[data$ind_id == EG[i, 2]])))
roosting_days_possible <- matrix(my.length.rdp, byrow = TRUE, ncol = length(unique(data$ind_id)))
colnames(roosting_days_possible) <- rownames(roosting_days_possible) <- unique(data$ind_id)
}
roosting_days_possible
# OBSERVED ROOST-SHARING PROPORTION
OSP <- days.together/roosting_days_possible
OSP[ is.nan(OSP) ] <- 0
diag(OSP) <- 0
# So here is association matrix derived from co-occurence data
round(OSP,2)
# social distance matrix
soc_dist <- as.dist(OSP)
下一步是获取DNA序列并制作遗传相关性矩阵
# creating matrix of relatedness
library(ape)
gen.str <- as.DNAbin(gen.raw)
my.gen.dist <- dist.dna(gen.str)
fit <- hclust(my.gen.dist, method="ward")
plot(fit) # display dendogram
最后,在这里我通过 Mantel 测试比较了社会距离和遗传距离。
library(ade4)
mantel.rtest(soc_dist, my.gen.dist, nrepet = 9999)
它的结果(p > 0.05)是否意味着社会结构和遗传结构之间没有相关性?
这是回答我的问题的适当解决方案吗?有任何想法吗?
我还发现,对于社会结构,这种类型的图表可能比树状图更好。适合寻找不同的社会群体。
# Show social structure
library(igraph)
g <- graph.adjacency(OSP, weighted=TRUE, mode ="undirected")
g <- simplify(g)
# set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree <- degree(g)
wc <- walktrap.community(g)
plot(wc, g)