Would either L1L1 or L2L2 regularisation lower the MSE on the training and test data?

人工智能 linear-regression mean-squared-error l2-regularization l1-regularization
2021-11-03 09:57:32

Consider linear regression. The mean squared error (MSE) is 120.5 for the training dataset. We've reached the minimum for the training data.

Is it possible that by applying Lasso (L1 regularization) we would get a lower MSE for the training data? Would it get lower for the test data? Would this also hold for ridge regression (L2 regularization)?

1个回答

The answer is largely the same whether we consider 1 or 2 regularisation, so I will just speak generally about regularisation.

Mean square error for training data

Given some training data {(xi,yi)}i=1n, a linear regression line Y=aX+b fit using the least squares method looks for coefficients that minimise the sum of squares, i.e. they are the minimisers given by

argmina,bi=1n(yi(axi+b))2.

This gives the same coefficients as minimising the mean square error

MSE((x1,y1),,(xn,yn))=1ni=1n(yi(axi+b))2.

So, by definition, the coefficients (a,b) minimise the MSE on the training data. Any regularisation will only increase the MSE on the training data.

Generalisation performance

The main point of regularisation is to prevent overfitting on the data and improve the generalisation performance (i.e. on the test set).

With an appropriate parameter for regularisation, you may obtain a smaller MSE on the test set. This depends on your dataset and the parameters you choose: strong regularisation may lead to underfitting, whereas weak regularisation might not make much difference to the coefficients that you fit.