有这样的数据集:
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), stock = c("stockA",
"stockB", "stockC", "stockD", "stockA", "stockB", "stockD"),
var1_before = c(-0.3957, -0.0201, -0.3957, -0.3729, 0.0498000000000001,
-0.5075, -0.3242), var2_before = c(-1.3106, -1.4492, -1.3106,
-1.6134, -1.3222, -1.5452, -1.3168), var3_before = c(3.3374,
2.6408, 3.3374, 3.4728, -0.2173, 3.6311, 3.0884), var4_before = c(-1533,
-1.3378, -1533, -1.5256, -1.6596, -1.7272, -1.4142), var5_before = c(0.3841,
0.1647, 0.3841, 551, 3.5372, 0.3317, 0.4339), var1_after = c(-0.4975,
-0.4107, -0.3557, -0.5223, -0.2173, -0.2003, -0.4473), var2_after = c(-1.6707,
-1.5982, -1.4963, -1.6512, -1.6596, -1.7075, -1.6361), var3_after = c(3.9367,
3.7744, 3398, 3.9537, 3.5372, 3.4673, 3.7018), var4_after = c(-1.6377,
-1.5513, -1.6543, -1.6823, -1.5497, -1.3507, -1.8195), var5_after = c(0.6483,
0.5484, 0.4024, 0.3634, 0.4352, 0.3594, 0.3441)), class = "data.frame", row.names = c(NA,
-7L))
变量是 id:是用户股票:用户(id)对此有印象/情绪的股票名称。对于每个用户,股票都是唯一的 var1_before-var5_before 和 var1_before-var1_after:是情绪得分的变量。例如,var1_before 是用户在特定事件之前的情绪,var1_after 是用户在特定事件之后的情绪分数。2,3,4,5 相同。
我知道有一些用户可以在事件之前和之后从 var1_before 移动到 var3_after。
对于每只股票,如何才能找到大多数用户在标签之前的状态,例如对于 stockA var1_before 和 var4_before 影响最大,但对于 stockB var2_before 似乎存在最多。
有什么机器学习方法可以做到吗?