您不需要多个merge()
步骤,只需要aggregate()
两个感兴趣的变量:
> aggregate(dx[, -1], by = list(ID = dx$ID), head, 1)
ID AGE FEM
1 1 30 1
2 2 40 0
3 3 35 1
> system.time(replicate(1000, aggregate(dx[, -1], by = list(ID = dx$ID),
+ head, 1)))
user system elapsed
2.531 0.007 2.547
> system.time(replicate(1000, {ag <- data.frame(ID=levels(dx$ID))
+ ag <- merge(ag, aggregate(AGE ~ ID, data=dx, function(x) x[1]), "ID")
+ ag <- merge(ag, aggregate(FEM ~ ID, data=dx, function(x) x[1]), "ID")
+ }))
user system elapsed
9.264 0.009 9.301
比较时间:
1)马特的解决方案:
> system.time(replicate(1000, {
+ agg <- by(dx, dx$ID, FUN = function(x) x[1, ])
+ # Which returns a list that you can then convert into a data.frame thusly:
+ do.call(rbind, agg)
+ }))
user system elapsed
3.759 0.007 3.785
2) Zach 的 reshape2 解决方案:
> system.time(replicate(1000, {
+ dx <- melt(dx,id=c('ID','FEM'))
+ dcast(dx,ID+FEM~variable,fun.aggregate=mean)
+ }))
user system elapsed
12.804 0.032 13.019
3)史蒂夫的data.table解决方案:
> system.time(replicate(1000, {
+ dxt <- data.table(dx, key='ID')
+ dxt[, .SD[1,], by=ID]
+ }))
user system elapsed
5.484 0.020 5.608
> dxt <- data.table(dx, key='ID') ## one time step
> system.time(replicate(1000, {
+ dxt[, .SD[1,], by=ID] ## try this one line on own
+ }))
user system elapsed
3.743 0.006 3.784
4) Chase 使用数字而非因子的快速解决方案ID
:
> dx2 <- within(dx, ID <- as.numeric(ID))
> system.time(replicate(1000, {
+ dy <- dx[order(dx$ID),]
+ dy[ diff(c(0,dy$ID)) != 0, ]
+ }))
user system elapsed
0.663 0.000 0.663
和 5) Matt Parker 替代 Chase 的解决方案,对于 character 或 factor ID
,它比 Chase 的数字略快ID
:
> system.time(replicate(1000, {
+ dx[c(TRUE, dx$ID[-1] != dx$ID[-length(dx$ID)]), ]
+ }))
user system elapsed
0.513 0.000 0.516