按组标记连续日期

数据挖掘 r 数据框 dplyr
2022-03-05 01:43:44

下面是我的数据示例(RoomDate)。我想生成变量Goal1Goal2Goal3每次Date变量中有一个间隙就意味着房间被关闭了。我的目标是按房间识别连续日期。

  Room    Date         Goal1     Goal2       Goal3
1 Upper A 2021-01-01   1         2021-01-01  2021-01-02
2 Upper A 2021-01-02   1         2021-01-01  2021-01-02
3 Upper A 2021-01-05   2         2021-01-05  2021-01-05
4 Upper A 2021-01-10   3         2021-01-10  2021-01-10
5 Upper B 2021-01-01   1         2021-01-01  2021-01-01
6 Upper B 2021-02-05   2         2021-02-05  2021-02-07
7 Upper B 2021-02-06   2         2021-02-05  2021-02-07
8 Upper B 2021-02-07   2         2021-02-05  2021-02-07
df <- data.frame("Area" = c("Upper A", "Upper A", "Upper A", "Upper A",
                            "Upper B", "Upper B", "Upper B", "Upper B"),
                "Date" = c("1/1/2021", "1/2/2021", "1/5/2021", "1/10/2021",
                           "1/1/2021", "2/5/2021", "2/6/2021", "2/7/2021"))
df$Date <- as.Date(df$Date, format = "%m/%d/%Y")
1个回答

使用dplyr

R> df %>%
  group_by(Area) %>%
  mutate(Goal = cumsum(c(1, diff(Date) > 1))) %>%
  group_by(Area, Goal) %>%
  mutate(Goal2 = min(Date), Goal3 = max(Date))
# # A tibble: 8 x 5
# # Groups:   Area, Goal [5]
#   Area    Date        Goal Goal2      Goal3     
#   <chr>   <date>     <dbl> <date>     <date>    
# 1 Upper A 2021-01-01     1 2021-01-01 2021-01-02
# 2 Upper A 2021-01-02     1 2021-01-01 2021-01-02
# 3 Upper A 2021-01-05     2 2021-01-05 2021-01-05
# 4 Upper A 2021-01-10     3 2021-01-10 2021-01-10
# 5 Upper B 2021-01-01     1 2021-01-01 2021-01-01
# 6 Upper B 2021-02-05     2 2021-02-05 2021-02-07
# 7 Upper B 2021-02-06     2 2021-02-05 2021-02-07
# 8 Upper B 2021-02-07     2 2021-02-05 2021-02-07

Goal列的计算方式如下:diffDate列上使用来识别差距(差异> 1),然后应用cumsum以获得连续值。