请输入您要查询的百科知识:

 

词条 Linear least squares
释义

  1. Main formulations

  2. Alternative formulations

  3. Objective function

  4. Discussion

  5. Properties

      Limitations  

  6. Applications

     Uses in data fitting 

  7. Example

     Using a quadratic model 

  8. See also

  9. References

  10. Further reading

  11. External links

{{Regression bar}}

Linear least squares (LLS) is the least squares approximation of linear functions to data.

It is a set of formulations for solving statistical problems involved in linear regression, including variants for

ordinary (unweighted),

weighted, and

generalized (correlated) residuals.

Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Main formulations

The three main linear least squares formulations are:

{{unordered list
|1= Ordinary least squares (OLS) is the most common estimator. OLS estimates are commonly used to analyze both experimental and observational data.

The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector β:

where is a vector whose ith element is the ith observation of the dependent variable, and is a matrix whose ij element is the ith observation of the jth independent variable. The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors:[1]

where is the transpose of row i of the matrix It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[εi2{{!}}xi] does not depend on i. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate z that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.


|2= Weighted least squares (WLS) are used when heteroscedasticity is present in the error terms of the model.
|3= Generalized least squares (GLS) is an extension of the OLS method, that allows efficient estimation of β when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the ith case is inversely proportional to var(εi). This special case of GLS is called "weighted least squares". The GLS solution to estimation problem is

where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.


}}

Alternative formulations

Other formulations include:

{{unordered list
|4= Iteratively reweighted least squares (IRLS) is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data.[2] In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of β.[3][4]
|5= Instrumental variables regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary instrumental variables zi such that E[ziεi] = 0. If Z is the matrix of instruments, then the estimator can be given in closed form as

Optimal instruments regression is an extension of classical IV regression to the situation where E[εi {{!}} zi] = 0.


|6= Total least squares (TLS)[5] is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free.
}}

In addition, percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term.[6]

In constrained least squares, one is interested in solving a linear least squares problem with an additional constraint on the solution.

Objective function

In OLS (i.e., assuming unweighted observations), the optimal value of the objective function is found by substituting in the optimal expression for the coefficient vector, can be written as:

where , the latter equality holding since is symmetric and idempotent. It can be shown from this[7] that under an appropriate assignment of weights the expected value of S is m − n. If instead unit weights are assumed, the expected value of S is , where is the variance of each observation.

If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a chi-squared () distribution with m − n degrees of freedom. Some illustrative percentile values of are given in the following table.[8]

These values can be used for a statistical criterion as to the goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation.

For WLS, the ordinary objective function above is replaced for a weighted average of residuals.

Discussion

In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system.

Mathematically, linear least squares is the problem of approximately solving an overdetermined system of linear equations, where the best approximation is defined as that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called linear least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are convex and have a closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast, non-linear least squares problems generally must be solved by an iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator.

In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of regression analysis. One basic form of such a model is an ordinary least squares model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned. See outline of regression analysis for an outline of the topic.

Properties

{{see also|Ordinary least squares#Properties}}

If the experimental errors, , are uncorrelated, have a mean of zero and a constant variance, , the Gauss–Markov theorem states that the least-squares estimator, , has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical distribution function of the errors. In other words, the distribution function of the errors need not be a normal distribution. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.

For example, it is easy to show that the arithmetic mean of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.

However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a maximum likelihood estimator.[9]

These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.

Limitations

An assumption underlying the treatment given above is that the independent variable, x, is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case, total least squares or more generally errors-in-variables models, or rigorous least squares, should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.[10][11]

In some cases the (weighted) normal equations matrix XTX is ill-conditioned. When fitting polynomials the normal equations matrix is a Vandermonde matrix. Vandermonde matrices become increasingly ill-conditioned as the order of the matrix increases.{{citation needed|date=December 2010}} In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate.{{citation needed|date=December 2010}} Various regularization techniques can be applied in such cases, the most common of which is called ridge regression. If further information about the parameters is known, for example, a range of possible values of , then various techniques can be used to increase the stability of the solution. For example, see constrained least squares.

Another drawback of the least squares estimator is the fact that the norm of the residuals, is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter , e.g., a small value of .{{citation needed|date=December 2010}} However, since the true parameter is necessarily unknown, this quantity cannot be directly minimized. If a prior probability on is known, then a Bayes estimator can be used to minimize the mean squared error, . The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as Stein's phenomenon. For example, if the measurement error is Gaussian, several estimators are known which dominate, or outperform, the least squares technique; the best known of these is the James–Stein estimator. This is an example of more general shrinkage estimators that have been applied to regression problems.

Applications

{{see also|Linear regression#Applications}}
  • Polynomial fitting: models are polynomials in an independent variable, x:
    • Straight line: .[12]
    • Quadratic: .
    • Cubic, quartic and higher polynomials. For regression with high-order polynomials, the use of orthogonal polynomials is recommended.[13]
  • Numerical smoothing and differentiation — this is an application of polynomial fitting.
  • Multinomials in more than one independent variable, including surface fitting
  • Curve fitting with B-splines [10]
  • Chemometrics, Calibration curve, Standard addition, Gran plot, analysis of mixtures

Uses in data fitting

The primary application of linear least squares is in data fitting. Given a set of m data points consisting of experimentally measured values taken at m values of an independent variable ( may be scalar or vector quantities), and given a model function with it is desired to find the parameters such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters so

Here, the functions may be nonlinear with respect to the variable x.

Ideally, the model function fits the data exactly, so

for all This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the residuals

so to minimize the function

After substituting for and then for , this minimization problem becomes the quadratic minimization problem above with

and the best fit can be found by solving the normal equations.

Example

{{see also|Ordinary least squares#Example|Simple linear regression#Example}}{{further|Polynomial regression}}

As a result of an experiment, four data points were obtained, and (shown in red in the diagram on the right). We hope to find a line that best fits these four points. In other words, we would like to find the numbers and that approximately solve the overdetermined linear system

of four equations in two unknowns in some "best" sense.

The residual, at each point, between the curve fit and the data is the difference between the right- and left-hand sides of the equations above. The least squares approach to solving this problem is to try to make the sum of the squares of these residuals as small as possible; that is, to find the minimum of the function

The minimum is determined by calculating the partial derivatives of with respect to and and setting them to zero

This results in a system of two equations in two unknowns, called the normal equations, which when solved give

and the equation of the line of best fit. The residuals, that is, the differences between the values from the observations and the predicated variables by using the line of best fit, are then found to be and (see the diagram on the right). The minimum value of the sum of squares of the residuals is

More generally, one can have regressors , and a linear model

Using a quadratic model

Importantly, in "linear least squares", we are not restricted to using a line as the model as in the above example. For instance, we could have chosen the restricted quadratic model . This model is still linear in the parameter, so we can still perform the same analysis, constructing a system of equations from the data points:

The partial derivatives with respect to the parameters (this time there is only one) are again computed and set to 0:

and solved

leading to the resulting best fit model

See also

  • Line-line intersection#Nearest point to non-intersecting lines, an application
  • Line fitting
  • Nonlinear least squares
  • Regularized least squares
  • Simple linear regression
  • Partial least squares regression

References

1. ^{{cite journal | last1=Lai | first1=T.L. | last2=Robbins | first2=H. | last3=Wei | first3=C.Z. | journal=PNAS | year=1978 | volume=75 | title=Strong consistency of least squares estimates in multiple regression | issue=7 | pages=3034–3036 | doi= 10.1073/pnas.75.7.3034 | pmid=16592540 | jstor=68164 | bibcode=1978PNAS...75.3034L | pmc=392707 }}
2. ^{{cite journal | title=The Unifying Role of Iterative Generalized Least Squares in Statistical Algorithms | last=del Pino | first=Guido | journal=Statistical Science | volume=4 | year=1989 | pages=394–403 | doi=10.1214/ss/1177012408 | issue=4 | jstor=2245853}}
3. ^{{cite journal | title=Adapting for Heteroscedasticity in Linear Models | last=Carroll | first=Raymond J. | journal=The Annals of Statistics | volume=10 | year=1982 | pages=1224–1233 | doi=10.1214/aos/1176345987 | issue=4 | jstor=2240725}}
4. ^{{cite journal | title=Robust, Smoothly Heterogeneous Variance Regression | last=Cohen | first=Michael |author2=Dalal, Siddhartha R. |author3=Tukey, John W. | journal=Journal of the Royal Statistical Society, Series C | volume=42 | year=1993 | pages=339–353 | issue=2 | jstor=2986237}}
5. ^{{cite journal | title=Total Least Squares: State-of-the-Art Regression in Numerical Analysis | last=Nievergelt | first=Yves | journal=SIAM Review | volume=36 | year=1994 |pages=258–264 | doi=10.1137/1036055 | issue=2 | jstor=2132463}}
6. ^{{cite journal | ssrn = 1406472 | title=Least Squares Percentage Regression | author = Tofallis, C | journal = Journal of Modern Applied Statistical Methods | volume=7 | year = 2009 | pages=526–534 | doi = 10.2139/ssrn.1406472 }}
7. ^{{cite book |title=Statistics in Physical Science |last=Hamilton |first=W. C. |authorlink= |year=1964 |publisher=Ronald Press |location=New York |isbn= |pages= |url= }}
8. ^{{cite book |title=Schaum's outline of theory and problems of probability and statistics |last=Spiegel |first=Murray R. |authorlink= |year=1975 |publisher=McGraw-Hill |location=New York |isbn=978-0-585-26739-5 |pages= |url= }}
9. ^{{cite book |title=The Mathematics of Physics and Chemistry |last=Margenau |first=Henry |authorlink= |author2=Murphy, George Moseley |year=1956 |publisher=Van Nostrand |location=Princeton |isbn= |pages= |url= }}
10. ^{{cite book |title=Data fitting in the Chemical Sciences |last=Gans |first=Peter |authorlink= |year=1992 |publisher=Wiley |location=New York |isbn=978-0-471-93412-7 |pages= |url= }}
11. ^{{cite book |title=Statistical adjustment of Data |last=Deming |first=W. E. |authorlink= |year=1943 |publisher=Wiley |location=New York |isbn= |pages= |url= }}
12. ^{{cite book |title=Analysis of Straight-Line Data |last=Acton |first=F. S. |authorlink= |year=1959 |publisher=Wiley |location=New York |isbn= |pages= |url= }}
13. ^{{cite book |title=Numerical Methods of Curve Fitting |last=Guest |first=P. G. |authorlink= |year=1961 |publisher=Cambridge University Press |location=Cambridge |isbn= |pages= |url= }}{{page needed|date=December 2010}}

Further reading

  • {{Cite book | author=Bevington, Philip R. |author2=Robinson, Keith D. | title=Data Reduction and Error Analysis for the Physical Sciences | year=2003 | publisher=McGraw-Hill | location= | isbn=978-0-07-247227-1 | pages=}}

External links

  • Least Squares Fitting – From MathWorld
  • Least Squares Fitting-Polynomial – From MathWorld
{{Least Squares and Regression Analysis}}

3 : Broad-concept articles|Least squares|Computational statistics

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 20:41:09