8.16 Method of Least Squares

INTRODUCTION

When performing experiments we often tabulate data in the form of ordered pairs (x₁, y₁), (x₂, y₂), . . . , (x_n, y_n), with each x_i distinct. Given the data, it is then often desirable to be able to extrapolate or predict y from x by finding a mathematical model, that is, a function that approximates or “fits” the data. In other words, we want a function f(x) such that

f(x₁) ≈ y₁, f(x₂) ≈ y₂, . . ., f(x_n) ≈ y_n.

But naturally we do not want just any function but a function that fits the data as closely as possible.

In the discussion that follows we shall confine our attention to the problem of finding a linear polynomial f(x) = ax + b or straight line that “best fits” the data (x₁, y₁), (x₂, y₂), . . ., (x_n, y_n). The procedure for finding this linear function is known as the method of least squares.

We begin with an example.

EXAMPLE 1 Line of Best Fit

Consider the data (1, 1), (2, 3), (3, 4), (4, 6), (5, 5) shown in FIGURE 8.16.1(a). Visually, and the fact that the line y = x + 1, shown in Figure 8.16.1(b), passes through two of the data points, we might take this line as that best fitting the data. ≡

Two grid graphs. Graph (a) has the following five dots plotted on it: (1, 1); (2, 3); (3, 4); (4, 6); (5, 5). Graph (b) has the same five points of the graph (a) plotted on it and a line passing through the following points: (0, 1); (1, 2); (2, 3); (3, 4); (4, 5); (5, 6). — FIGURE 8.16.1 Data and line in Example 1

Obviously we need something better than a visual guess to determine the linear function y = f(x), as in the last example. We need a criterion that defines the concept of “best fit” or, as it is sometimes called, “the goodness of fit.”

If we try to match the data points with the function f(x) = ax + b, then we wish to find a and b that satisfy the system of equations

(1)

or (2)

Unfortunately (1) is an overdetermined system and, unless the data points lie on the same line, has no solution. Thus we shall be content to find a vector X = so that the right side AX is close to the left side Y.

Least Squares Line

If the data points are (x₁, y₁), (x₂, y₂), . . ., (x_n, y_n), then one way to determine how well the linear function f(x) = ax + b fits the data is to measure the vertical distances between the points and the graph of f :

e_i = |y_i − f(x_i)|, i = 1, 2, . . ., n.

We can think of each e_i as the error in approximating the data value y_i by the functional value f(x_i). See FIGURE 8.16.2. Intuitively, we know that the function f will fit the data well if the sum of all the e_i values is a minimum. Actually, a more convenient approach to the problem is to find a linear function f so that the sum of the squares of all the e_i values is a minimum. We shall define the solution of the system (1) to be those coefficients a and b that minimize the expression ; that is,

or E = (3)

The graph has seven dots and a line plotted on it. The estimated coordinates of the seven dots are as follows: (0, 1); (1, 1.5); (2, 2.6); (3, 2.3) labeled as (x subscript 1, y subscript 1); (4, 4.5); (5, 4); (6, 5). The line labeled y = f (x) passes through the dots (1, 1.5) and (6, 5). The following equation is written beside the point (x subscript 1, y subscript 1): accolade e subscript 1 = l y subscript 1 – f (x subscript 1)l. — FIGURE 8.16.2 *e_i* is the error in approximating *y_i* with f(*x_i*)

The expression E is called the sum of the square errors. The line y = ax + b that minimizes the sum of the square errors (3) is defined to be the line of best fit and is called the least squares line for the data (x₁, y₁), (x₂, y₂), . . ., (x_n, y_n).

The problem remains: How does one find a and b so that (3) is a minimum? The answer can be found from calculus. If we think of (3) as a function of two variables a and b, then to find the minimum value of E we set the first partial derivatives equal to zero:

Partial differentiation is reviewed in Section 9.4.

The last two conditions yield, in turn,

(4)

Expanding the sums and usingb = nb, we find the system (4) is the same as

(5)

Although we shall not give the details, the values of a and b that satisfy the system (5) yield the minimum value of E.

In terms of matrices it can be shown that (5) is equivalent to

A^TAX = A^TY, (6)

where A, Y, and X are defined in (2). Since A is an n × 2 matrix and A^T is a 2 × n matrix, the matrix A^TA is a 2 × 2 matrix. Moreover, unless the data points all lie on the same vertical line, the matrix A^TA is nonsingular. Thus, (6) has the unique solution

X = (A^TA)⁻¹A^TY. (7)

We say that X is the least squares solution of the overdetermined system (1).

EXAMPLE 2 Least Squares Line

Find the least squares line for the data in Example 1. Calculate the sum of the square errors E for this line and the line y = x + 1.

SOLUTION

For the function f(x) = ax + b the data (1, 1), (2, 3), (3, 4), (4, 6), (5, 5) lead to the overdetermined system

(8)

Now by identifying

Y = and A = we have A^TA = ,

and so (7) gives

Thus the least squares solution of (8) is a = 1.1 and b = 0.5, and the least squares line is y = 1.1x + 0.5. For this line the sum of the square errors is

For the line y = x + 1 that we guessed and that also passed through two of the data points, we find E = 3.0.

By way of comparison, FIGURE 8.16.3 shows the data points, the line y = x + 1 (green), and the least squares line y = 1.1x + 0.5 (blue). ≡

The graph has five dots and two lines plotted on it. The coordinates of the five dots are as follows: (1, 1); (2, 3); (3, 4); (4, 6); (5, 5). Line 1 labeled y = x + 1 begins at the point (0, 1) and ends at the point (5, 6). Line 2 labeled y = 1.1x + 0.5 begins at the point (0, 0.5) and ends at the point (5, 6). — FIGURE 8.16.3 Least squares line (in blue) in Example 2

Least Squares Parabola

The procedure illustrated in Example 2 is easily modified to find a least squares parabola.

EXAMPLE 3 Least Squares Parabola

Find the least squares parabola for the data (1, 1), (2, 4), (3, 7), (4, 5).

SOLUTION

For the quadratic function f(x) = ax² + bx + c, the analogue of system (8) is

From this system we see the matrix A now has three columns. So with

equation (7) gives

Therefore a = −1.25, b = 7.75, c = −5.75 and the equation of the least squares parabola is f(x) = −1.25x² + 7.75x − 5.75. The graphs of the data points and the quadratic function f are given in FIGURE 8.16.4. ≡

A graph has four dots and a parabola plotted on it. The coordinates of the four dots are as follows: (1, 1); (2, 4); (3, 7); (4, 5). The parabola begins at the approximate point (0.9, 0) peaks at the approximate point (3, 6.3) and ends at the approximate point (5.2, 0). The parabola passes through the points (1, 1) and (4, 5) plotted on the graph. — FIGURE 8.16.4 Least squares parabola in Example 3

8.16 Exercises Answers to selected odd-numbered problems begin on page ANS-22.

In Problems 1–6, find the least squares line for the given data.

(2, 1), (3, 2), (4, 3), (5, 2)
(0, −1), (1, 3), (2, 5), (3, 7)
(1, 1), (2, 1.5), (3, 3), (4, 4.5), (5, 5)
(0, 0), (2, 1.5), (3, 3), (4, 4.5), (5, 5)
(0, 2), (1, 3), (2, 5), (3, 5), (4, 9), (5, 8), (6, 10)
(1, 2), (2, 2.5), (3, 1), (4, 1.5), (5, 2), (6, 3.2), (7, 5)
In an experiment, the following correspondence was found between temperature T (in °C) and kinematic viscosity v (in Centistokes) of an oil with a certain additive:

Find the least squares line v = aT + b. Use this line to estimate the viscosity of the oil at T = 140 and T = 160.
In an experiment the following correspondence was found between temperature T (in °C) and electrical resistance R (in MΩ):

Find the least squares line R = aT + b. Use this line to estimate the resistance at T = 700.

In Problems 9 and 10, proceed as in Example 3 and find the least squares parabola for the given data.

(1, 1), (2, 1), (3, 2), (4, 5)
(−2, 1), (−1, 1), (1, 2), (2, 3)