8.16 Method of Least Squares
INTRODUCTION
When performing experiments we often tabulate data in the form of ordered pairs (x1, y1), (x2, y2), . . . , (xn, yn), with each xi distinct. Given the data, it is then often desirable to be able to extrapolate or predict y from x by finding a mathematical model, that is, a function that approximates or “fits” the data. In other words, we want a function f(x) such that
f(x1) ≈ y1, f(x2) ≈ y2, . . ., f(xn) ≈ yn.
But naturally we do not want just any function but a function that fits the data as closely as possible.
In the discussion that follows we shall confine our attention to the problem of finding a linear polynomial f(x) = ax + b or straight line that “best fits” the data (x1, y1), (x2, y2), . . ., (xn, yn). The procedure for finding this linear function is known as the method of least squares.
We begin with an example.
EXAMPLE 1 Line of Best Fit
Consider the data (1, 1), (2, 3), (3, 4), (4, 6), (5, 5) shown in FIGURE 8.16.1(a). Visually, and the fact that the line y = x + 1, shown in Figure 8.16.1(b), passes through two of the data points, we might take this line as that best fitting the data. ≡
Obviously we need something better than a visual guess to determine the linear function y = f(x), as in the last example. We need a criterion that defines the concept of “best fit” or, as it is sometimes called, “the goodness of fit.”
If we try to match the data points with the function f(x) = ax + b, then we wish to find a and b that satisfy the system of equations
(1)
or (2)
Unfortunately (1) is an overdetermined system and, unless the data points lie on the same line, has no solution. Thus we shall be content to find a vector X = so that the right side AX is close to the left side Y.
Least Squares Line
If the data points are (x1, y1), (x2, y2), . . ., (xn, yn), then one way to determine how well the linear function f(x) = ax + b fits the data is to measure the vertical distances between the points and the graph of f :
ei = |yi − f(xi)|, i = 1, 2, . . ., n.
We can think of each ei as the error in approximating the data value yi by the functional value f(xi). See FIGURE 8.16.2. Intuitively, we know that the function f will fit the data well if the sum of all the ei values is a minimum. Actually, a more convenient approach to the problem is to find a linear function f so that the sum of the squares of all the ei values is a minimum. We shall define the solution of the system (1) to be those coefficients a and b that minimize the expression ; that is,
or E = (3)
The expression E is called the sum of the square errors. The line y = ax + b that minimizes the sum of the square errors (3) is defined to be the line of best fit and is called the least squares line for the data (x1, y1), (x2, y2), . . ., (xn, yn).
The problem remains: How does one find a and b so that (3) is a minimum? The answer can be found from calculus. If we think of (3) as a function of two variables a and b, then to find the minimum value of E we set the first partial derivatives equal to zero:
Partial differentiation is reviewed in Section 9.4.
The last two conditions yield, in turn,
(4)
Expanding the sums and usingb = nb, we find the system (4) is the same as
(5)
Although we shall not give the details, the values of a and b that satisfy the system (5) yield the minimum value of E.
In terms of matrices it can be shown that (5) is equivalent to
ATAX = ATY, (6)
where A, Y, and X are defined in (2). Since A is an n × 2 matrix and AT is a 2 × n matrix, the matrix ATA is a 2 × 2 matrix. Moreover, unless the data points all lie on the same vertical line, the matrix ATA is nonsingular. Thus, (6) has the unique solution
X = (ATA)−1ATY. (7)
We say that X is the least squares solution of the overdetermined system (1).
EXAMPLE 2 Least Squares Line
Find the least squares line for the data in Example 1. Calculate the sum of the square errors E for this line and the line y = x + 1.
SOLUTION
For the function f(x) = ax + b the data (1, 1), (2, 3), (3, 4), (4, 6), (5, 5) lead to the overdetermined system
(8)
Now by identifying
Y = and A = we have ATA = ,
and so (7) gives
Thus the least squares solution of (8) is a = 1.1 and b = 0.5, and the least squares line is y = 1.1x + 0.5. For this line the sum of the square errors is
For the line y = x + 1 that we guessed and that also passed through two of the data points, we find E = 3.0.
By way of comparison, FIGURE 8.16.3 shows the data points, the line y = x + 1 (green), and the least squares line y = 1.1x + 0.5 (blue). ≡
Least Squares Parabola
The procedure illustrated in Example 2 is easily modified to find a least squares parabola.
EXAMPLE 3 Least Squares Parabola
Find the least squares parabola for the data (1, 1), (2, 4), (3, 7), (4, 5).
SOLUTION
For the quadratic function f(x) = ax2 + bx + c, the analogue of system (8) is
From this system we see the matrix A now has three columns. So with
equation (7) gives
Therefore a = −1.25, b = 7.75, c = −5.75 and the equation of the least squares parabola is f(x) = −1.25x2 + 7.75x − 5.75. The graphs of the data points and the quadratic function f are given in FIGURE 8.16.4. ≡
8.16 Exercises Answers to selected odd-numbered problems begin on page ANS-22.
In Problems 1–6, find the least squares line for the given data.
- (2, 1), (3, 2), (4, 3), (5, 2)
- (0, −1), (1, 3), (2, 5), (3, 7)
- (1, 1), (2, 1.5), (3, 3), (4, 4.5), (5, 5)
- (0, 0), (2, 1.5), (3, 3), (4, 4.5), (5, 5)
- (0, 2), (1, 3), (2, 5), (3, 5), (4, 9), (5, 8), (6, 10)
- (1, 2), (2, 2.5), (3, 1), (4, 1.5), (5, 2), (6, 3.2), (7, 5)
- In an experiment, the following correspondence was found between temperature T (in °C) and kinematic viscosity v (in Centistokes) of an oil with a certain additive:
Find the least squares line v = aT + b. Use this line to estimate the viscosity of the oil at T = 140 and T = 160.
- In an experiment the following correspondence was found between temperature T (in °C) and electrical resistance R (in MΩ):
Find the least squares line R = aT + b. Use this line to estimate the resistance at T = 700.
In Problems 9 and 10, proceed as in Example 3 and find the least squares parabola for the given data.
- (1, 1), (2, 1), (3, 2), (4, 5)
- (−2, 1), (−1, 1), (1, 2), (2, 3)