这是一个关于 NDCG 的问题,它是一个推荐评估指标。
以下被用作建议的评估指标。
代码如下:
def dcg_score (y_true, y_score, k = 20, gains = "exponential"):
"""Discounted cumulative gain (DCG) at rank k
Parameters
----------
y_true: array-like, shape = [n_samples]
Ground truth (true relevance labels).
y_score: array-like, shape = [n_samples]
Predicted scores.
k: int
Rank.
gains: str
Whether gains should be "exponential" (default) or "linear".
Returns
-------
DCG @k: float
"""
order = np.argsort (y_score) [::-1]
y_true = np.take (y_true, order [: k])
if gains == "exponential":
gains = 2 ** y_true-1
elif gains == "linear":
gains = y_true
else:
raise ValueError ("Invalid gains option.")
# highest rank is 1 so +2 instead of +1
discounts = np.log2 (np.arange (len (y_true)) + 2)
return np.sum (gains / discounts)
def ndcg_score (y_true, y_score, k = 20, gains = "exponential"):
"""Normalized discounted cumulative gain (NDCG) at rank k
Parameters
----------
y_true: array-like, shape = [n_samples]
Ground truth (true relevance labels).
y_score: array-like, shape = [n_samples]
Predicted scores.
k: int
Rank.
gains: str
Whether gains should be "exponential" (default) or "linear".
Returns
-------
NDCG @k: float
"""
best = dcg_score (y_true, y_true, k, gains)
actual = dcg_score (y_true, y_score, k, gains)
return actual / best
假设 k = 5。
这个时候,对于第k个内不能推荐的项目,NDCG应该如何计算呢?
例如,
y_true = [5,4,3,2,1]
y_score = [0,0,0,0,0] # 0 means we could not recommend within the top 5
此时,
>>> np.argsort ([0,0,0,0]) [::-1]
array ([3, 2, 1, 0])
所以,按照上面的代码,
NDCG @ 5 = 1.0
这看起来很奇怪。
在这种情况下,分数是否应该为 0 并且不包括在 NDCG 分数计算中?
如果您有任何参考资料,我只需展示它们即可。
谢谢你。