成本函数依赖于大小 - 批量梯度下降

数据挖掘 梯度下降
2022-03-02 06:03:29

我正在使用 python 应用简单的最小均方更新规则,但不知何故,我得到的 theta 值变得非常高。

from pylab import * 
data = array( 
[[1,4.9176,1.0,3.4720,0.998,1.0,7,4,42,3,1,0,25.9],
[2,5.0208,1.0,3.5310,1.50,2.0,7,4,62,1,1,0,29.5],
[3,4.5429,1.0,2.2750,1.175,1.0,6,3,40,2,1,0,27.9],
[4,4.5573,1.0,4.050,1.232,1.0,6,3,54,4,1,0,25.9],
[5,5.0597,1.0,4.4550,1.121,1.0,6,3,42,3,1,0,29.9],
[6,3.8910,1.0,4.4550,0.988,1.0,6,3,56,2,1,0,29.9],
[7,5.8980,1.0,5.850,1.240,1.0,7,3,51,2,1,1,30.9],
[8,5.6039,1.0,9.520,1.501,0.0,6,3,32,1,1,0,28.9],
[9,16.4202,2.5,9.80,3.420,2.0,10,5,42,2,1,1,84.9],
[10,14.4598,2.5,12.80,3.0,2.0,9,5,14,4,1,1,82.9],
[11,5.8282,1.0,6.4350,1.225,2.0,6,3,32,1,1,0,35.9],
[12,5.303,1.0,4.9883,1.552,1.0,6,3,30,1,2,0,31.5],
[13,6.2712,1.0,5.520,0.975,1.0,5,2,30,1,2,0,31.0],
[14,5.9592,1.0,6.6660,1.121,2.0,6,3,32,2,1,0,30.9],
[15,5.050,1.0,5.0,1.020,0.0,5,2,46,4,1,1,30.0],
[16,5.6039,1.0,9.520,1.501,0.0,6,3,32,1,1,0,28.9],    
[17,8.2464,1.5,5.150,1.664,2.0,8,4,50,4,1,0,36.9],    
[18,6.6969,1.5,6.9020,1.488,1.5,7,3,22,1,1,1,41.9],
[19,7.7841,1.5,7.1020,1.376,1.0,6,3,17,2,1,0,40.5],
[20,9.0384,1.0,7.80,1.50,1.5,7,3,23,3,3,0,43.9],
[21,5.9894,1.0,5.520,1.256,2.0,6,3,40,4,1,1,37.5],
[22,7.5422,1.5,4.0,1.690,1.0,6,3,22,1,1,0,37.9],
[23,8.7951,1.5,9.890,1.820,2.0,8,4,50,1,1,1,44.5]])


x = zeros( (len(data[:,4]) ,2)) 
x[:,0] ,x[:,1] = 1, data[:,4] 
y  = data[:,-1] 
theta = array([100.0,100.0]) 
alpha  = 0.4 
iternum = 100 
for i in range(iternum):    
    theta -= alpha*dot(transpose(x),(dot(x,theta)-y)) 
print theta

我得到的答案是 [7.18957001e+150 1.19047264e+151] 这对于给定的代码是不现实的。

但是,如果我将内部循环更改为

 for i in range(iternum):    
      theta -= alpha*dot(transpose(x),(dot(x,theta)-y))/size(data[:,4])  #Basically divide by the total number of training examples
 print theta

我得到正确答案。然而,据我所知,成本函数不一定取决于训练样本的大小。

有人可以指出问题的根源吗?

抱歉,如果问题的解释有点令人费解。

1个回答

您的第二个更改(计算平均误差)是正确的方法。想象一下:如果您的训练集中有十亿个示例,即使您在每个示例上都犯了(非常)小错误,总和也会增加一个很大的数字。所以最小二乘情况下的成本函数是方误差,而不仅仅是总和。