R总结条件
数据挖掘
r
dplyr
2022-02-21 18:22:36
3个回答
这是 iris 数据集的示例
t(sapply(by(iris$Sepal.Length,iris$Species,function(x){x[1:2]}),as.numeric))
物种是您的客户,而 Sepal.Length 是您的果实。
如果你想要一个 dplyr 解决方案,你可以试试这个:
yourdata %>%
mutate(date = paste(date, "-2018", sep = ""), # add year to date
date = as.Date(date, format = "%d-%b-%Y")) %>% # save date in date format
arrange(date) %>% # sort by date
group_by(customer) %>%
slice(1:2) %>% # keep only first two rows (fruits) per customer
mutate(date = c("fruit1", "fruit2")) # change date variable to fruit1/fruit2
spread(key = date, value = fruit) %>% # spread data
更短的代码版本(压缩变异部分):
yourdata %>%
mutate(date = as.Date(paste(date, "-2018", sep = ""),
format = "%d-%b-%Y")) %>%
arrange(date) %>% # sort by date
group_by(customer) %>%
slice(1:2) %>% # keep only first two rows (fruits) per customer
mutate(date = c("fruit1", "fruit2")) %>% # change date variable to fruit1/fruit2
spread(key = date, value = fruit) # spread data
这是使用data.table的解决方案
首先按customerand对 data.table 进行排序date
然后group by customer并选择第一个两个fruits
> df[order(customer,date)][,.(fruit1=fruit[1],fruit2=fruit[2]),by=customer]
customer fruit1 fruit2
1: A orange banana
2: B apple apple
3: C banana banana
样本数据
> df <- data.table(
+ customer = c('A','A','C','C','B','B','C','B','A'),
+ fruit = c('orange','apple','banana','orange','apple','banana','banana','apple','banana'),
+ date = c(as.Date('2018-05-04'),as.Date('2018-07-09'),as.Date('2018-01-02'),as.Date('2018-01-03'),as.Date('2018-01-02'),
+ as.Date('2018-04-05'),as.Date('2018-01-02'),as.Date('2018-01-06'),as.Date('2018-06-01'))
+ )
> df
customer fruit date
1: A orange 2018-05-04
2: A apple 2018-07-09
3: C banana 2018-01-02
4: C orange 2018-01-03
5: B apple 2018-01-02
6: B banana 2018-04-05
7: C banana 2018-01-02
8: B apple 2018-01-06
9: A banana 2018-06-01
