计算科学 - 最小二乘回归 (LSR) 的 MATLAB 代码评估 - 吾爱随笔录

最小二乘回归 (LSR) 的 MATLAB 代码评估

计算科学算法 matlab 最小二乘

2021-11-28 06:41:04

下面是我自己在 MATLAB 中实现最小二乘回归算法的方法。请您看一下并告诉我这是否有意义；如果它确实应该做什么？

编辑：请注意注释的命令。

function [MSE_train_mean,MSE_train_std,MSE_test_mean,MSE_test_std,w_star] = perform_lsr(X,Y,Z)

% 604 days for the training set (29/05/2009 - 18/10/2011)
% 145 days for the evaluation set (19/10/2011 - 16/05/2012)
Xtran = X(1:end-145,:);
Ytran = Y(1:end-145,:);
Ztran = Z(1:end-145,:);
Xeval = X(end-144:end,:);
Yeval = Y(end-144:end,:);
Zeval = Z(end-144:end,:);

% Set number of runs.
runs = 20;

% Set number of folds.
folds = 5;

% Initialize auxiliary matrices.
MSE_train = zeros(runs,folds);
MSE_test = zeros(runs,folds);
w_star = zeros(runs,folds,size(Xtran,2));

% Perform runs.
for r=1:runs,

% Split original dataset into training and test set.
split_assignments = cross_val_split(folds,size(Xtran,1));

% Perform folds.
for f=1:folds,

% Assign explanatory variables (x).
x_train = Xtran((split_assignments(:,f)==0),:);
x_test = Xtran((split_assignments(:,f)==1),:);

% Assign respond variable (y).
y_train = Ytran((split_assignments(:,f)==0),:);
y_test = Ytran((split_assignments(:,f)==1),:);

% Retrieve size of matrices.
[l_train_n,l_train_m] = size(x_train);
[l_test_n,l_test_m] = size(x_test);

% Estimate parameters (w0,w1,...) of the 1st order model from training set.
w_star(r,f,:) = (x_train' * x_train) \ (x_train' * y_train);
w_star_temp = w_star(r,f,:);
w_star_temp = w_star_temp(:);

% Apply the learned weights on both training and test sets and compute the
% corresponding MSEs.
MSE_train(r,f) = (1 / l_train_n) * (w_star_temp' * (x_train') * x_train * w_star_temp - 2 * y_train' * x_train * w_star_temp + y_train' * y_train);
MSE_test(r,f) = (1 / l_test_n) * (w_star_temp' * (x_test') * x_test * w_star_temp - 2 * y_test' * x_test * w_star_temp + y_test' * y_test);

end
end

% % Plot both training and test sets' MSEs as a function of runs and folds.
% figure, mesh(MSE_train), title('Training Set`s MSEs vs Runs and Folds');
% figure, mesh(MSE_test), title('Testing Set`s MSEs vs Runs and Folds');
% 
% % Plot both training and test sets' mean of MSEs accompanied by their
% % corresponding std of MSEs.
% figure, errorbar(1:runs,mean(MSE_train,2),std(MSE_train,0,2));
% figure, errorbar(1:runs,mean(MSE_test,2),std(MSE_test,0,2));

% Average over folds and then over runs.
MSE_train_mean = mean(mean(MSE_train,2),1);
MSE_train_std = std(mean(MSE_train,2),0,1);
MSE_test_mean = mean(mean(MSE_test,2),1);
MSE_test_std = std(mean(MSE_test,2),0,1);
w_star_temp = mean(mean(w_star,2),1);
w_star = w_star_temp(:);

% % Calculate in-sample residuals.
% y_hat_tran = Xtran * w_star;
% e_tran = Ytran - y_hat_tran;
% % Check that they sum up to zero.
% fprintf('In-sample residuals sum up to %i.\n',sum(e_tran));
% % Finally, plot them.
% figure, scatter(1:size(e_tran,1),e_tran);
% 
% % Calculate out-of-sample residuals.
% y_hat_eval = Xeval * w_star;
% e_eval = Yeval - y_hat_eval;
% % Check that they sum up to zero.
% fprintf('Out-of-sample residuals sum up to %i.\n',sum(e_eval));
% % Finally, plot them.
% figure, scatter(1:size(e_eval,1),e_eval);

% Print summary statistics.
fprintf('Over %i runs, %i folds, and regarding the training set, the mean and standard deviation is %.10f and %.10f respectively.\n',runs,folds,MSE_train_mean,MSE_train_std);
fprintf('Over %i runs, %i folds, and regarding the testing set, the mean and standard deviation is %.10f and %.10f respectively.\n',runs,folds,MSE_test_mean,MSE_test_std);

end

1个回答

Robert J. Lopez 在高级工程数学中的第 44 章“离散数据的近似”给出了以下最小二乘回归算法：

clear
d = 6;
x = [1, 2, 3, 4, 5, 6, 7]
y = [3, 4, 1, 3, 2, 4, 9]
p = [0:1:d]
for i=1:length(y)
    for j=1:length(p)
        A1(i,j)=x(i)^p(j);
    end
end
A1
A2 = A1'
B0=y
for k=1:length(y)-1
    B0=cat(1,B0,y);
end
B0
B1=B0'
A3 = A2*B1
C1 = A2*A1
C2 = inv(C1)
C3 = C2*A3
C4 = C3'
pol = C4(1,1:length(p))
C5 = A1*C4
yreg = sum(C5,2)

我用 Scilab 写的。应该在 Matlab 中工作，因为 Scilab 是 Matlab 克隆。“pol”是最适合数据的“d”次多项式的系数。

使用数学：

(*"d" is the degree of the polynomial to be fitted to x,y pairs*)
d = 6;
x = {1, 2, 3, 4, 5, 6, 7};
y = {3, 4, 1, 3, 2, 4, 9};
p = Range[d + 1] - 1;
A1 = Table[Table[x[[i]]^p[[j]], {j, 1, Length[p]}], {i, 1, Length[y]}];
A2 = Transpose[A1];
B1 = Transpose[Table[y, {i, 1, Length[y]}]];
A3 = A2.B1;
C1 = A2.A1;
C2 = Inverse[C1];
C3 = C2.A3;
C4 = Transpose[C3];
Print["polynomial"]
pol = Total[C4[[1, All]]*(z^p)]
g1 = ListPlot[
   Prepend[Table[{x[[i]], y[[i]]}, {i, 1, Length[x]}], {0, 0}], 
   PlotMarkers -> Automatic];
g2 = Plot[pol, {z, 0, Length[x] + 1}];
Show[g1, g2, PlotRange -> {Min[y] - 1, Max[y] + 1}, ImageSize -> Full]
C5 = A1*C4;
yreg = Total[Transpose[C5]];

绘制多项式度参数“d”的不同值的多项式，我们得到：

d=0

0次多项式

= 26/7 = 3.71429

数据的 y 值的零次多项式或平均值