数据挖掘 - 为什么线性回归方法的成本会增加？ - 吾爱随笔录

我正在从头开始实施线性回归模型。

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 16 14:40:53 2017

@author: user
"""

import os
import random
os.chdir('/home/user/Desktop/andrewng/machine-learning-ex1/ex1')

import pandas as pd
data = pd.read_csv('/home/user/Desktop/andrewng/machine-learning-ex1/ex1/ex1data1.txt',header=None)

theta_0 = random.random() 
theta_1 = random.random() 
alpha = 0.001

print(len(data))


hist = -90
cost = 0
print('theta_0    + theta_1    ')
while(cost-hist>0.001):
    hist = cost
    cost = 0
    a = 0
    b = 0
    for i in range(len(data)):
        k = data.iloc[i]
        a = a +  theta_0 + theta_1*k[0] - k[1]
        b = b + (theta_0 + theta_1*k[0] - k[1])*k[0]
    theta_0 = theta_0 - alpha*a/len(data)
    theta_1 = theta_1 - alpha*b/len(data)
    #print(str(theta_0)+'    '+str(theta_1))
    for j in range(len(data)):
        k = data.iloc[i]
        cost  = cost + (theta_0 + theta_1*k[0] - k[1])**2
    cost = cost/(2*len(data))
    print(cost)
    #if(cost>hist):
    #    print(str(theta_0)+'    '+str(theta_1))
    #    break
print(str(theta_0)+'    '+str(theta_1))

根据理论，每次迭代的成本应该会降低，但对我来说，成本会不断增加。

（数据）