在 1972 年的书中找到了解决方案(George R. Price, Ann. Hum. Genet., Lond, pp485-490, Extension of covariance selection math, 1972)。
有偏加权样本协方差:
Σ=1∑Ni=1wi∑Ni=1wi(xi−μ∗)T(xi−μ∗)
以及通过应用贝塞尔校正给出的无偏加权样本协方差:
Σ=1∑Ni=1wi−1∑Ni=1wi(xi−μ∗)T(xi−μ∗)
其中是(无偏的)加权样本均值:μ∗
μ∗=∑Ni=1wixi∑Ni=1wi
重要提示:仅当权重是“重复”型权重时才有效,这意味着每个权重代表一个观察的出现次数,并且其中表示实际样本量(实际样本总数,占权重)。∑Ni=1wi=N∗N∗
我更新了 Wikipedia 上的文章,您还可以在其中找到无偏加权样本方差的方程:
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_covariance
和逐列相乘进行矩阵乘法以包装事物并自动执行求和。例如在 Python Pandas/Numpy 代码中:wi(xi−μ∗)(xi−μ∗)
import pandas as pd
import numpy as np
# X is the dataset, as a Pandas' DataFrame
mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise)
mean = pd.Series(mean, index=list(X.keys())) # Convert to a Pandas' Series (it's just aesthetic and more ergonomic, no differenc in computed values)
xm = X-mean # xm = X diff to mean
xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance's values computed correctly))
sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance
使用非加权数据集和等效加权数据集进行了一些健全性检查,它工作正常。
有关无偏方差/协方差理论的更多详细信息,请参阅这篇文章。