Wichita State University Logo

Math 511: Linear Algebra

3.8 Least Squares Problems

3.8.1 Least Squares Solutions of Overdetermined Systems ¶

One of the most common application problems encountered for linear algebra is least squares problem of finding a solution to an overdetermined linear system. Overdetermined systems are ones with more equations than unknowns, so there is too much data for the problem often leading to inconsistent systems. In these cases, we have the matrix equation

$$ A\mathbf{x} = \mathbf{b} $$
with $A\in\mathbb{R}^{m\times n}$ and $m \gt n$. Since this system is typically inconsistent, we will not in general be able to find $\mathbf{x}\in\mathbb{R}^n$ that solves $A\mathbf{x}=\mathbf{b}$, but we can find the least squares solution $\hat{\mathbf{x}}$ that is the closest vector to being the solution to the system. The way we will measure the "closeness" of $A\hat{\mathbf{x}}$ to being equal to $\mathbf{b}$ is by defining a residual

$$ r(\mathbf{x}) = \mathbf{b} - A\mathbf{x} $$
and determine the distance between vectors $\mathbf{b}$ and $A\mathbf{x}$ by computing the norm of the residual

$$ \|r(\mathbf{x})\| = \left\| \mathbf{b} - A\mathbf{x} \right\| $$
The least squares solution $\hat{\mathbf{x}}$ of $A\mathbf{x} = \mathbf{b}$ will be the vector $\mathbf{x}\in\mathbb{R}^n$ that minimizes the norm of the residual $\|r(\mathbf{x})\|$. For convenience, we note that minimizing $\|r(\mathbf{x})\|$ is equivalent to minimizing $\|r(\mathbf{x})\|^2$, so our theorems will focus on minimizing the latter.

Theorem 3.8.1 ¶

The Least Squares Solution of a Linear System Exists

Let $S$ be a subspace of $\mathbb{R}^m$. For each $\mathbf{b}\in\mathbb{R}^m$, there is an unique element $\mathbf{p}\in S$ which is closest to $\mathbf{b}$; that is

$$ \left\| \mathbf{b} - \mathbf{y} \right\| \gt \left\| \mathbf{b} - \mathbf{p} \right\| $$
for any $\mathbf{y}\neq\mathbf{p}$ in $S$. Additionally, $\mathbf{p}\in S$ will be closest to a given vector $\mathbf{b}\in\mathbb{R}^m$ if an only if $\mathbf{b}-\mathbf{p}\in S^\perp$.

Proof: ¶

$(\Leftarrow)$ Since $S$ is a subspace of $\mathbb{R}^m$, we can write $\mathbb{R}^m$ as a direct sum of $S$ and its orthogonal complement $S^\perp$; $\mathbb{R}^m = S\oplus S^\perp$. Therefore each element $\mathbf{b}\in\mathbb{R}^m$ may be expressed uniquely as a sum

$$ \mathbf{b} = \mathbf{p} + \mathbf{z} $$
where $\mathbf{p}\in S$ and $\mathbf{z}\in S^\perp$. If $\mathbf{y} \neq \mathbf{p}$ is in $S$, then

$$ \left\| \mathbf{b} - \mathbf{y} \right\|^2 = \left\| (\mathbf{b} - \mathbf{p}) + (\mathbf{p} - \mathbf{y}) \right\|^2 $$
Vector $\mathbf{p} - \mathbf{y} \in S$ and vector $\mathbf{b} - \mathbf{p}\in S^\perp$ are orthogonal, so by the Pythagorean law the above equation becomes

$$ \left\| \mathbf{b} - \mathbf{y} \right\|^2 = \left\| \mathbf{b} - \mathbf{p}\right\|^2 + \left\| \mathbf{p} - \mathbf{y} \right\|^2 $$
Norms are always positive, so we remove one of the right hand terms to produce the following inequality:

$$ \left\| \mathbf{b} - \mathbf{y} \right\| \gt \left\| \mathbf{b} - \mathbf{p}\right\| $$
This gives that, for $\mathbf{p}\in S$ and $\mathbf{b}-\mathbf{p}\in S^\perp$, $\mathbf{p}$ is the element of the subspace $S$ closest to $\mathbf{b}$.
$(\Rightarrow)$ Let $\mathbf{q}\in S$ and $\mathbf{b}-\mathbf{q}\notin S^\perp$, then $\mathbf{q}\neq \mathbf{p}$ and repeating the previous argument with $\mathbf{y} = \mathbf{q}$ yields

$$ \left\| \mathbf{b} - \mathbf{q} \right\| \gt \left\| \mathbf{b} - \mathbf{p}\right\| $$
∎

Rarely, $\mathbf{b}\in S$ which will allow us to find an exact solution $\mathbf{x}$ to $A\mathbf{x}=\mathbf{b}$. In this special case,

$$ \mathbf{b} = \mathbf{p} + \mathbf{z} $$
for $\mathbf{p}\in S$ and $\mathbf{z}\in S^\perp$, so

$$ \mathbf{b} = \mathbf{p} + \mathbf{0} $$
as direct sum representations are unique.

3.8.2 Finding the Least Squares Solution ¶

The least squares solution $\hat{\mathbf{x}}$ to the least squares problem $A\mathbf{x} = \mathbf{b}$ is valid if and only if $\mathbf{p} = A\hat{\mathbf{x}}$ is the vector in the column space $C(A)$ closest to $\mathbf{b}$. What this means geometrically is that we project $\mathbf{b}$ onto $C(A)$ to get $\mathbf{p}$ and then find $\hat{\mathbf{x}}$. By Theorem 3.8.1
$$ \mathbf{b} - \mathbf{p} = \mathbf{b} - A\hat{\mathbf{x}} = r(\hat{\mathbf{x}}) $$
which must be in $C(A)^\perp$. Hence, $\hat{\mathbf{x}}$ is the least squares solution if and only if $r(\hat{\mathbf{x}})\in C(A)^\perp$.

For a vector in $\mathbf{b}\in\mathbb{R}^2$, we can represent this visually as
Shows the vector projection of b onto C(A) and resulting residual vector r(x) for two cases. Case a is a two-dimensional image and case b is a three dimensional image.

Figure 5.3.1
To prove that $\hat{\mathbf{x}}$ is a solution to the least squares problem, we need to show that $r(\hat{\mathbf{x}})\in N\left(A^T\right)$. This follows from Theorem 5.2.1, which states that

$$ C(A)^\perp = N\left(A^T\right) $$
and $\hat{\mathbf{x}}$ solves the least squares problem if and only if $r(\hat{\mathbf{x}})\in N\left(A^T\right)$.

We will derive the normal equations by proving two short theorems. First that $N\left(A^T A\right) = N(A)$, and second that the equation $A^T A\mathbf{x} = A\mathbf{b}$ has a unique solution $\hat{\mathbf{x}}$.

Theorem 3.8.2 ¶

The Null Space of $A^T A$ is the same as the Null Space of $A$.

$$ N\left(A^T A\right) = N(A) $$

Proof: ¶

$(\Rightarrow)$ Suppose $\mathbf{x}\in N\left(A^T A\right)$, then

$$ A^T A\mathbf{x} = \mathbf{0} $$
This means that $\mathbf{x}$ is orthogonal to each row vector of $A^T A$. Since $A^T A$ is a symmetric matrix, this means that $\mathbf{x}$ is also orthogonal to each of $A^T A$'s columns. Recall that $\mathbf{0}$ is orthogonal to every vector, in particular $\mathbf{x}$, so $\mathbf{x}^T\mathbf{0} = 0 $. We use this fact in conjunction with the previous equation and write

$$ \mathbf{x}^T\mathbf{0} = \mathbf{x}^T \left(A^T A\mathbf{x}\right) = \left(\mathbf{x}^T A^T\right) A\mathbf{x} = \left(A\mathbf{x}\right)^T A\mathbf{x} = \left\langle A\mathbf{x}, A\mathbf{x} \right\rangle = 0 $$
So $A\mathbf{x} = \mathbf{0}$, since only $\mathbf{0}$ is orthogonal to itself. Hence, $\mathbf{x}\in N(A)$.

$(\Leftarrow)$ Suppose $\mathbf{x}\in N(A)$, then $A\mathbf{x} = \mathbf{0}$ and

$$ A^T A\mathbf{x} = A^T\left(A\mathbf{x}\right) = A^T\mathbf{0} = \mathbf{0} $$
so $\mathbf{x}\in N\left(A^T A\right)$.
∎

Theorem 3.8.3 ¶

The Normal Equations Yield a Unique Solution to the Least Squares Problem

If $A\in\mathbb{R}^{m\times n}$ and $\,\text{rank}(A)=n$, the normal equations

$$ A^T A\mathbf{x} = A^T\mathbf{b} $$
have a unique solution

$$ \hat{\mathbf{x}} = \left(A^T A\right)^{-1}A^T\mathbf{b} $$
and $\, \hat{\mathbf{x}}$ uniquely solves the least squares problem $A\mathbf{x} = \mathbf{b}$.

Proof: ¶

Since $A\in\mathbb{R}^{m\times n}$ is rank $n$, $N(A) = \left\{\mathbf{0}\right\}$. From our previous theorem, $N(A) = N\left(A^T A\right) = \left\{\mathbf{0}\right\}$. With a trivial null space, $A^T A\in\mathbb{R}^{n\times n}$ is invertible and its nonhomogenous systems have unique solutions . In particular, the normal equations

$$ A^T A\mathbf{x} = A^T\mathbf{b} $$
have the unique solution given by

$$ \hat{\mathbf{x}} = \left(A^T A\right)^{-1}A^T\mathbf{b} $$
the unique least squares solution.
∎

3.8.3 Lines of Best Fit (Linear Regression) ¶

The context where you have likely encountered least squares problems before is finding a line of best fit for a data set.

$$ \begin{array}{c|r|r|r|r|r|r} x & x_1 & x_2 & \dots & x_j & \dots & x_m \\ \hline y & y_1 & y_2 & \dots & y_j & \dots & y_m \end{array} $$
Given a set of values $\left\{(x_i,y_i)\,:\,1\le j\le m\right\}$, we seek to find a line in the form $y = a + bx$ that best represents the data set. To do this, we make a system equations out of the set

$$ \begin{align*} a + bx_1 &= y_1 \\ a + bx_2 &= y_2 \\ \vdots \\ a + bx_j &= y_j \\ \vdots \\ a + bx_m &= y_m \\ \end{align*} $$
for all $i=1,2,\ldots,m$. This yields an $m\times 2$ inconsistent linear system

$$ A\mathbf{c} = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \ddots & \ddots \\ 1 & x_j \\ \ddots & \ddots \\ 1 & x_m \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_j \\ \vdots \\ y_m \end{bmatrix} = \mathbf{y}, $$
where $A$ is the linear operator, $\mathbf{c}$ is the vector of coefficients $\begin{bmatrix} a \\ b \end{bmatrix}$, and $\mathbf{y}$ is the vector of outputs $y_j$ for each $x_j$ in our table. Notice that vector $\mathbf{c}$ is NOT a vector in $\mathbb{R}^2$; it is a vector in $P_2$, the vector space of polynomials of degree less than 2. The first coordinate in vector $\mathbf{c}$ is the $y$-intercept and the second coordinate is the slope.

One solves the least squares problem by obtaining the normal equation . That is multiply both sides of the equation by $A^T$

$$ A^TA\mathbf{c} = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_m \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} = A^T\mathbf{y} $$
Notice that

$A^TA$ is a symmetric matrix, $\left(A^TA\right)^T = A^T\left(A^T\right)^T = A^TA$.
$N(A) = \left\{\mathbf{0}\right\}$ so $N\left(A^TA\right) = \left\{\mathbf{0}\right\}$ if at least one of the $x_j$'s is different from the others.
$A^TA\mathbf{c} = A^T\mathbf{y}$ has a unique least squares solution if at least one of the $x_j$'s is different from the others.
We will see higher dimensional applications. When $N(A)$ is non-trivial, then $N\left(A^TA\right) = N(A)$ will also be non-trivial and one obtains infinitely many least squares solutions.

For cases of lines of best fit that should pass through the origin ¶

If you know that your least squares line must pass through the origin, that is the $y$-intercept must be zero, one obtains the linear system

$$ \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}\begin{bmatrix} a \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$

$$ \left(\displaystyle\sum_{i=1}^m x_i^2\right)a = \begin{bmatrix} x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}\begin{bmatrix} a \end{bmatrix} = \begin{bmatrix} x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} = \displaystyle\sum_{i=1}^m x_iy_i $$

Example 1 ¶

A common laboratory exercise for physics students is using Hooke's Law to determine the spring constant of a spring experimentally. The law states that the force exerted by a spring is proportional to and opposite in direction of its displacement from its equilibrium (resting, not stretched or compressed) length and is given by

$$ F = -kx $$
where $F$ is the force exerted by the spring, $x$ is the displacement, and $k$ is a constant of proportionality called the spring constant .

Suppose some freshmen physics students have compiled the following data

Displacement $x$ (m)	Force $F(x)$ (N)
0.020 m	4.1 N
0.030 m	6.0 N
0.040 m	7.9 N
0.050 m	9.8 N

To determine $k$, we can use positive values for simplicity and construct the normal equations

$$ \begin{bmatrix} 0.02 & 0.03 & 0.04 & 0.05 \end{bmatrix}\begin{bmatrix} 0.02 \\ 0.03 \\ 0.04 \\ 0.05 \end{bmatrix}\begin{bmatrix} k \end{bmatrix} = \begin{bmatrix} 0.02 & 0.03 & 0.04 & 0.05 \end{bmatrix}\begin{bmatrix} 4.1 \\ 6.0 \\ 7.9 \\ 9.8 \end{bmatrix} $$
This simplifies to

$$ 0.0054k = 1.068 $$
so $k\approx 197.\bar{7}\ \text{N}/\text{m}$ and those students should report $k = 2.0\times 10^2\ \text{N}/\text{m}$ for the appropriate number of significant figures . Line of best fit for Hooke's Law

Figure 5.3.2

Example 2 ¶

Lines of best fit may also be used to fit market data for economics purposes. Suppose data was collected showing the demand for gasoline at different costs per gallon. The data for gas prices

Price per gallon $p$ (\$)	Demand $x(p)$ (in millions of gallons)
\$2.75	500
\$2.50	620
\$2.25	710
\$2.00	810
\$1.75	930
\$1.50	1050

Here it is not reasonable to assume that $b=0$ since there is likely a minimum gas price based on the cost of producing and delivering the gasoline. First one looks at the data

Line of best fit for gas price-demand equation

Figure 5.3.3

Next we decide how to model our data. In this case we decide that like Hooke's Law there is a first order (line) that best fits the data points. If we can find such a line

$$ x = a + bp, $$
then we would have

$$ \begin{align*} a + 2.75b &= 500 \\ a + 2.50b &= 620 \\ a + 2.25b &= 710 \\ a + 2.00b &= 810 \\ a + 1.75b &= 930 \\ a + 1.50b &= 1050 \end{align*} $$

So we will construct our inconsistent system of equations using the relationship $A\mathbf{c}=\mathbf{d}$, where

$$ \begin{align*} \mathbf{c} &= \begin{bmatrix} a \\ b \end{bmatrix}\text{ is the vector of coefficients for our line,} \\ \mathbf{p} &=\text{ the vector of gas prices, and } \\ \mathbf{d} &=\text{ the vector of resulting sales in millions of gallons (the demand).} \end{align*} $$
Notice that vector $\mathbf{c}$ is not a vector in $\mathbb{R}^2$; it is a vector in $P_2$, $\mathbf{c}\in P_2$. The polynomial in $P_2$, the line we seek is the line $x(p) = a + bp$ with slope $b$ and $y$-intercept $a$ that best fits the data points in our table.

$$ A\mathbf{c} = \begin{bmatrix} 1 & 2.75 \\ 1 & 2.50 \\ 1 & 2.25 \\ 1 & 2.00 \\ 1 & 1.75 \\ 1 & 1.50 \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 500 \\ 620 \\ 710 \\ 810 \\ 930 \\ 1050 \end{bmatrix} = \mathbf{d} $$
The solution is

$$ \hat{\mathbf{x}} = \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 1688 \\ -432 \end{bmatrix} $$

Follow Along

First one creates the __normal equation__ by multiplying both sides of the equation by $A^T$.
$$ \begin{align*} A^TA\mathbf{c} &= A^T\mathbf{d} \\ \\ A^TA &= \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 2.75 & 2.50 & 2.25 & 2.00 & 1.75 & 1.50 \end{bmatrix}\begin{bmatrix} 1 & 2.75 \\ 1 & 2.50 \\ 1 & 2.25 \\ 1 & 2.00 \\ 1 & 1.75 \\ 1 & 1.50 \end{bmatrix} = \begin{bmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{bmatrix} \\ \\ A^T\mathbf{c} &= \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 2.75 & 2.50 & 2.25 & 2.00 & 1.75 & 1.50 \end{bmatrix}\begin{bmatrix} 500 \\ 620 \\ 710 \\ 810 \\ 930 \\ 1050 \end{bmatrix} = \begin{bmatrix} 4620 \\ 9345 \end{bmatrix} \end{align*} $$
Notice that

$A^TA$ is a symmetric matrix, $\left(A^TA\right)^T = A^T\left(A^T\right)^T = A^TA$.
$N(A) = \left\{\mathbf{0}\right\}$ so $N\left(A^TA\right) = \left\{\mathbf{0}\right\}$.

Now we have the linear system
$$ A^TA\mathbf{c} = \begin{bmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{bmatrix} = \begin{bmatrix} 4620 \\ 9345 \end{bmatrix} = A^T\mathbf{d} \\ $$
Let us solve this linear system using Cramer's Rule.
$$ \begin{align*} a &= \dfrac{\begin{vmatrix} 4620 & 12.7500 \\ 9345 & 28.1875 \end{vmatrix}}{\begin{vmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{vmatrix}} = \dfrac{11078}{6.5625} \approx 1688 \\ \\ b &= \dfrac{\begin{vmatrix} 6.0000 & 4620 \\ 12.7500 & 9345 \end{vmatrix}}{\begin{vmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{vmatrix}} = \dfrac{ -2835}{6.5625} \approx -432 \end{align*} $$
The linear least squares line in figure 5.3.3 is given by $x(p) = 1688 - 432\, p$.

3.8.4 Polynomial Interpolation ¶

Another application of the least squares problem is to find a polynomial that represents a data set. Suppose that we have a data set $\left\{(x_i,y_i)\right\}$ composed of $n+1$ values. We may, by solving a particular least squares problem, find polynomial $\mathbf{p}$ of degree $n$ or less, $\mathbf{p}\in P_{n+1}$, that passes through each of these points. This polynomial would allow us to interpolate the data set, that is talk about values between the specific data points in $\left\{(x_i,y_i)\right\}$. The resulting polynomial $\mathbf{p}\in P_{n+1}$ is called the interpolating polynomial .

It is important to be careful when doing these kinds of interpolations. The best representation of $n+1$ data points will not necessarily be polynomial of degree $n$. In fact, this is rarely the case. We tend to use a low order polynomial. Look at Example 1 and Example 2. These are both actually polynomial interpolation problems, we just choose to use a line (or technically affine function ) which is a polynomial of degree $1$.

To construct the least squares problem for polynomial interpolation, we use the Vandermonde matrix

$$ V = \begin{bmatrix} x_1^n & x_1^{n-1} & \ldots & x_1 & 1 \\ x_2^n & x_2^{n-1} & \ldots & x_2 & 1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_m^n & x_m^{n-1} & \ldots & x_m & 1 \end{bmatrix} $$
Each $x_j$ comes from the data set, and the matrix $V$ takes the place of $A$ in the normal equations for polynomial interpolation.

Example 3 ¶

We will do two least squares problems on the same data set, finding a degree $2$ and a degree $5$ interpolating polynomial. Consider the data set

$$ \begin{array}{c|r|r|r|r|r|r|r|r|r} x & -1.95 & 0.22 & 0.88 & 1.16 & 1.25 & 4.48 & 4.91 & 5.96 & 7.16 \\\hline y & -1.96 & -1.76 & -1.62 & -1.25 & -0.51 & 2.08 & 3.24 & 3.53 & 3.71 \end{array} $$
For the first problem, we interpolate the data set using the polynomial $ c_0 + c_1 x + c_2 x^2 $, finding the least squares solution of the system

$$ \begin{bmatrix} x_1^2 & x_1 & 1 \\ x_2^2 & x_2 & 1 \\ \vdots & \vdots & \vdots \\ x_m^2 & x_m & 1 \end{bmatrix}\begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$
For the second version, we use the polynomial $ c_0 + c_1 x + c_2 x^2 + c_3 x^3 + c_4 x^4 + c_5 x^5 $ whose system is very close in form

$$ \begin{bmatrix} x_1^5 & x_1^4 & \ldots & x_1 & 1 \\ x_2^5 & x_2^4 & \ldots & x_2 & 1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_m^5 & x_m^4 & \ldots & x_m & 1 \end{bmatrix}\begin{bmatrix} c_0 \\ c_1 \\ \ddots \\ c_4 \\ c_5 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$
We solve these and plot the solutions on the same axis for comparison purposes.

A degree 2 and degree 5 interpolating polynomial for a given data set. For this data set, the degree 5 polynomial is a superior estimate, since it does a better job of modeling the edge behavior of the data.

Figure 5.3.4

The polynomial of degree 3 that best fits the data

$$ \mathbf{c} = \begin{bmatrix} -1.5077 \\ 0.5909 \\ 0.0347 \end{bmatrix} $$
$$ y = 0.0345x^2 + 0.5909x - 1.5077 $$

Follow Along

$$ \begin{align*} V\mathbf{c} = \begin{bmatrix} 1.0000 & -1.9500 & 3.8025 \\ 1.0000 & 0.2200 & 0.0484 \\ 1.0000 & 0.8800 & 0.7744 \\ 1.0000 & 1.1600 & 1.3456 \\ 1.0000 & 1.2500 & 1.5625 \\ 1.0000 & 4.4800 & 20.0704 \\ 1.0000 & 4.9100 & 24.1081 \\ 1.0000 & 5.9600 & 35.5216 \\ 1.0000 & 7.1600 & 51.656 \\ \end{bmatrix}\mathbf{c} &= \begin{bmatrix} -1.9600 \\ -1.7600 \\ -1.6200 \\ -1.2500 \\ -0.5100 \\ 2.0800 \\ 3.2400 \\ 3.5300 \\ 3.7100 \end{bmatrix} = \mathbf{y} \\ \\ V^TV\mathbf{c} = \begin{bmatrix} 9.00 & 2.41 & 13.85 \\ 24.10 & 13.85 & 78.38 \\ 13.85 & 78.38 & 4893.3 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 5.4600 \\ 72.7509 \\ 424.1718 \end{bmatrix} = V^T\mathbf{y} \end{align*} $$

The polynomial of degree 5 that best fits the data

$$ \mathbf{c} = \begin{bmatrix} -1.9948 \\ 0.6358 \\ 0.2021 \\ -0.0509 \\ 0.0078 \\ -0.0006 \end{bmatrix} $$
$$ y = -0.0006x^5 + 0.0078x^4 - 0.0509x^3 + 0.2021x^2 + 0.6358x - 1.9948 $$

Follow Along

$$ \begin{align*} V\mathbf{c} = \begin{bmatrix} 1 & -1.95 & 3.8025 & -7.4149 & 14.5 & -28.195 \\ 1 & 0.22 & 0.0484 & 0.0106 & .00234 & 0.0005 \\ 1 & 0.88 & 0.7744 & 0.6815 & 0.5970 & 0.5277 \\ 1 & 1.16 & 1.3456 & 1.5609 & 1.8016 & 2.1003 \\ 1 & 1.25 & 1.5625 & 1.9531 & 2.4414 & 3.0518 \\ 1 & 4.48 & 20.070 & 89.915 & 402.82 & 1804.6 \\ 1 & 4.91 & 24.108 & 118.37 & 581.20 & 2853.7 \\ 1 & 5.96 & 35.522 & 211.70 & 1261.8 & 7520.2 \\ 1 & 7.16 & 51.266 & 367.06 & 2628.2 & 18817 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} -1.9600 \\ -1.7600 \\ -1.6200 \\ -1.2500 \\ -0.5100 \\ 2.0800 \\ 3.2400 \\ 3.5300 \\ 3.7100 \end{bmatrix} = \mathbf{y} \\ \\ V^TV\mathbf{c} = \begin{bmatrix} 9 & 24.07 & 138.50 & 783.85 & 4893.3 & 30974 \\ 24.07 & 138.50 & 783.85 & 4893.3 & 30974 & 201713 \\ 138.50 & 783.85 & 4893.3 & 30974 & 201713 & 1336746 \\ 783.85 & 4893.3 & 30974 & 201713 & 1336746 & 8999611 \\ 4893.3 & 30974 & 201713 & 1336746 & 8999611 & 61329825 \\ 30974 & 201713 & 1336746 & 8999611 & 61329825 & 422058506 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 5.4600 \\ 72.751 \\ 424.17 \\ 2690.14 \\ 16892 \\ 10941 \end{bmatrix} = V^T\mathbf{y} \end{align*} $$

3.8.5 Other Types of Least Squares Problems ¶

Least squared problems come in many forms, and we can use other equations we are familiar with to construct them. For example, suppose we have a data set $\left\{(x_i,y_i)\,:\,1\le i\le m\right\}$ where the points roughly represent a circle. We can use the equation of a circle

$$ (x-c_1)^2 + (y-c_2)^2 = r^2 $$
with our data points $\left\{(x_i,y_i)\right\}$ plugged in allow for the construction of a least squares problem if we do a little bit of algebra

$$ \begin{align*} (x-c_1)^2 + (y-c_2)^2 &= r^2 \\ \\ x^2 -2xc_1 + c_1^2 + y^2 - 2yc_2 + c_2^2 &= r^2 \\ \\ x^2 + y^2 &= 2xc_1 + 2yc_2 + \left(r^2 - c_1^2 - c_2^2\right) \end{align*} $$
By setting $ c_3 = r^2 - c_1^2 - c_2^2 $, we can form a linear system based on

$$ 2xc_1 + 2yc_2 + c_3 = x^2 + y^2 $$
which looks like

$$ A\mathbf{c} = \begin{bmatrix} 2x_1 & 2y_1 & 1 \\ 2x_2 & 2y_2 & 1 \\ \vdots & \vdots & \vdots \\ 2x_m & 2y_m & 1 \end{bmatrix}\begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = \begin{bmatrix} x_1^2 + y_1^2 \\ x_2^2 + y_2^2 \\ \vdots \\ x_m^2 + y_m^2 \end{bmatrix} = \mathbf{b} $$
The least squares solution will give us $\mathbf{c} = \left[ c_1\ c_2\ c_3 \right]^T$, we interpret $(c_1,c_2)$ as the center of the circle, and

$$ r = \sqrt{c_1^2 + c_2^2 + c_3} $$
yields the radius.

If we use the data set

$$ \begin{array}{c|r|r|r|r|r|r|r|r|r|r} x & 8.03 & 6.86 & 5.99 & 5.83 & 4.73 & 4.02 & 4.81 & 5.41 & 5.71 & 7.77 \\\hline y & 3.06 & 4.48 & 4.98 & 5.04 & 4.54 & 2.74 & 1.37 & 1.13 & 1.02 & 2.01 \end{array} $$
and solve the least squared problem, we have the information required to determine the center and radius of the approximating circle. The plot of the circle and the data is

A roughly circular set of data points is shown with a circle found by solving a least squares problem featuring the data set under the assumption that the data set is circular.

Figure 5.3.5

The least squares solution

$$ \mathbf{c} = \begin{bmatrix} 5.9855 \\ 2.9739 \\ -40.75257 \end{bmatrix} $$
The solution curve, or least squares circle has center $\approx (5.9855, 2.9739)$, and radius $$ r \approx \sqrt{ 5.9855^2 + 2.9739^2 - 40.75257 } \approx 1.9793 $$

Follow Along

$$ \begin{align*} A\mathbf{c} = \begin{bmatrix} 16.0600 & 6.1200 & 1 \\ 13.7200 & 8.9600 & 1 \\ 11.9800 & 9.9600 & 1 \\ 11.6600 & 10.0800 & 1 \\ 9.4600 & 9.0800 & 1 \\ 8.0400 & 5.4800 & 1 \\ 9.6200 & 2.7400 & 1 \\ 10.8200 & 2.2600 & 1 \\ 11.4200 & 2.0400 & 1 \\ 15.5400 & 4.0200 & 1 \\ \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 73.845 \\ 67.130 \\ 60.681 \\ 59.391 \\ 42.985 \\ 23.668 \\ 25.013 \\ 30.545 \\ 33.645 \\ 64.413 \end{bmatrix} = \mathbf{b} \\ \\ A^TA\mathbf{c} = \begin{bmatrix} 1461.3 & 724.61 & 118.32 \\ 724.6 & 463.96 & 60.74 \\ 118.3 & 60.74 & 10 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 6079.66 \\ 3241.59 \\ 481.31 \end{bmatrix} = A^T\mathbf{b} \end{align*} $$

Your use of this self-initiated mediated course material is subject to our Creative Commons License .

Creative Commons Attribution-NonCommercial-ShareAlike 4.0

Creative Commons Logo - Black
Attribution
You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Creative Commons Logo - Black
Noncommercial
You may not use the material for commercial purposes.

Share Alike
You are free to share, copy and redistribute the material in any medium or format. If you adapt, remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.