$Wichita State University Logo$

Math 511: Linear Algebra

5.4 Least Squares Problems

5.4.1 The Big Picture of Linear Algebra¶

$$ \require{color} \definecolor{brightblue}{rgb}{.267, .298, .812} \definecolor{darkblue}{rgb}{0.0, 0.0, 1.0} \definecolor{palepink}{rgb}{1, .73, .8} \definecolor{softmagenta}{rgb}{.99,.34,.86} \definecolor{blueviolet}{rgb}{.537,.192,.937} \definecolor{jonquil}{rgb}{.949,.792,.098} \definecolor{shockingpink}{rgb}{1, 0, .741} \definecolor{royalblue}{rgb}{0, .341, .914} \definecolor{alien}{rgb}{.529,.914,.067} \definecolor{crimson}{rgb}{1, .094, .271} \def\ihat{\mathbf{\hat{\unicode{x0131}}}} \def\jhat{\mathbf{\hat{\unicode{x0237}}}} \def\khat{\mathrm{\hat{k}}} \def\tombstone{\unicode{x220E}} \def\contradiction{\unicode{x2A33}} $$

Big Picture of Linear Algebra showing the Four Main Subspaces with their dimensions

In this section we want to restrict ourselves to $m\times n$ matrices that are overdetermined; that is, $m > n$. However we also want all of the columns of matrix $A$ to be pivot columns. If an $m\times n$ matrix $A$ has dependent columns, we will preform a change of variables to eliminate dependent variables so that we have an $m\times r$ matrix, where $r = \text{rank}(A) \le n$. Later in this section we will consider matrices with dependent columns.

5.4.2 Projections onto Subspaces¶

We started chapter 5 with a geometrical discussion of the component and projection of a vector onto another vector. Now consider the projection of a vector $\mathbf{u}\in V$ onto a subspace $S$ in an inner product space $V$.

Shows the vector projection of b onto C(A) and resulting residual vector r(x) for two cases. Case a is a two-dimensional image and case b is a three dimensional image.

Figure 5.3.1

Example 1¶

The projection onto a line in $\mathbb{R}^2$ is very similar projection of a vector onto another vector. In this case the projection vector $\mathbf{p}$ must be on the line, while the residual vector $\mathbf{r}$ is perpendicular to the line.

Example 2¶

The projection of a vector onto a plane in $\mathbb{R}^3$ yields a vector on the plane so that the residual vector $\mathbf{r}=\mathbf{b}-\mathbf{p}$ is orthogonal to every vector in the plane. Thus the residual vector is in the span of any normal vector to the plane.

Definition¶

The projection of a vector $\mathbf{b}$ in an inner product space $V$ onto a subspace $S$ in $V$ is a vector $\mathbf{p}\in S$ so that the residual vector $\mathbf{r}=\mathbf{b}-\mathbf{p}$ is orthogonal to every vector in $S$. That is, for every vector $\mathbf{u}\in S$, the inner product $\langle\mathbf{u},\mathbf{r}\rangle=0$.

Dr. Strang talks about projection vectors onto subspaces including the column space of a matrix in $Video Lecture detailing projections of vectors onto subspaces of an inner product space.$ Projections onto Subspaces

5.4.3 Projections onto the Column Space¶

In the big picture it is often the case that one wants to solve $A\mathbf{x}=\mathbf{b}$, however the vector $\mathbf{b}$ is not in the column space of matrix $A$, $\mathbf{b}\notin C(A)$. This can happen for a variety of reasons

There may be errors in computing vector $\mathbf{b}$ from available data
There may be errors in the data used to compute vector $\mathbf{b}$
There may be limitations to a mathematical model that fails to include every possible value for $\mathbf{b}$

In any of these cases, we are stuck with a linear system that is inconsistent.

One possible solution is to find out what part of vector $\mathbf{b}$ is consistent with $C(A)$, and determine what part of vector $\mathbf{b}$ is inconsistent with $C(A)$.

In order to solve a linear system for the part of vector $\mathbf{b}$ consistent with $C(A)$, one solves a slightly different linear system

$$ A\mathbf{\hat{x}} = \mathbf{p} $$

Definition¶

If $\mathbf{b}$ is a vector in normed vector space $V$, and $S$ is a subspace of $V$, then vector $\mathbf{p}$ is the closest possible vector to $\mathbf{b}$ in $S$ if the difference or residual vector

$$ \mathbf{r} = \mathbf{b}-\mathbf{p} $$

is as small as possible. The norm of the residual vector is called the residual.

In order for vector $\mathbf{p}$ to be closest, the residual or error vector

$$ \mathbf{r} = \mathbf{b} - \mathbf{p} = \mathbf{b} - A\mathbf{\hat{x}} $$

must be as small as possible. In other words, the residual vector must be orthogonal to $C(A)$. Thus vector $\mathbf{p}$ must be the projection of vector $\mathbf{b}$ onto $C(A)$.

Question¶

If $\mathbf{p}\in C(A)$ and $\mathbf{p}+\mathbf{r}=\mathbf{b}$, then in what subspace of the codomain of matrix $A$ does the residual vector $\mathbf{r}$ reside?

Answer¶

$N\left(A^T\right)$

Algebraically, since $\mathbf{p}\in C(A)$ we know that the linear system $A\hat{\mathbf{x}}=\mathbf{p}$ is consistent. This give us

$$ \begin{align*} A^T\mathbf{r} = A^T\left(\mathbf{b} - \mathbf{p}\right) &= A^T\left(\mathbf{p} - A\mathbf{\hat{x}}\right) = \mathbf{0} \\ \\ A^T\mathbf{b} - A^TA\mathbf{\hat{x}} &= \mathbf{0} \\ \\ A^TA\mathbf{\hat{x}} &= A^T\mathbf{b} \end{align*} $$

Definition¶

For linear system $A\mathbf{x}=\mathbf{b}$, the following is called the normal equation

$$ A^TA\hat{\mathbf{x}} = A^T\mathbf{b} $$

We will show below that $N(A) = N\left(A^TA\right)$. If as we discussed in 5.4.1, $N(A) = \left\{\mathbf{0}\right\}$, we have that $A^TA$ is a square matrix with $N\left(A^TA\right) = \left\{\mathbf{0}\right\}$. That means $A^TA$ is a nonsingular, and hence invertible. This allows us to find a unique least squares solution.

$$ \mathbf{\hat{x}} = \left(A^TA\right)^{-1}A^T\mathbf{b} $$

Multiplying both sides of this equation by matrix $A$ on the left yields

$$ \mathbf{p} = A\mathbf{\hat{x}} = A\left(A^TA\right)^{-1}A^TA\mathbf{b} $$

Thus matrix $P$ represents a linear transformation that maps every vector in $\mathbf{b}\in\mathbb{R}^m$ onto its projection onto $\mathbf{p}\in C(A)$.

Definition¶

The Projection Matrix

For $m\times n$ matix $A$, such that $m\ge n$ and rank$(A) = n$, the projection matrix $P$ from $\mathbb{R}^m\rightarrow C(A)$ is defined by

$$ P = A\left(A^TA\right)^{-1}A^T $$

The derivation shows that vector that are not in $C(A)$ are mapped onto their projections onto $C(A)$. One should verify that $P$ maps elements of $\mathbf{b}\in C(A)$ onto themselves. Recall that if $\mathbf{b}\in C(A)$, then the original linear system $A\mathbf{x}=\mathbf{b}$ is consistent. Thus

$$ \mathbf{p} = A\left(A^TA\right)^{-1}A^T\mathbf{b} = A\left(A^TA\right)^{-1}A^T(A\mathbf{x}) = A\left(A^TA\right)^{-1}(A^TA)\mathbf{x} = A\mathbf{x} = \mathbf{b}\qquad{\color{green}\Large{\checkmark}} $$

5.4.4 Projection Matrices¶

For $m\times n$ matrix $A$, for which $m\ge n$, and rank$(A) = n$, the projection matrix $P:\mathbb{R}^m\rightarrow C(A)$ is defined by

$$ P = A\left(A^TA\right)^{-1}A^T $$

Now if $\mathbf{b}\in N\left(A^T\right)$, then

$$ \begin{align*} P\mathbf{b} &= \left(A\left(A^TA\right)^{-1}A^T\right)\mathbf{b} \\ \\ &= \left(A\left(A^TA\right)^{-1}\right)\left(A^T\mathbf{b}\right) \\ \\ &= \left(A\left(A^TA\right)^{-1}\right)\mathbf{0} = \mathbf{0} \end{align*} $$

So projection matrix $P$ projects vectors orthogonal to the column space of $A$ to the zero vector; as it should! We also know that the codomain $\mathbb{R}^m$ is the direct sum of $C(A)$ and $N(A^T)$,

$$ C(A)\oplus N(A^T) = \mathbb{R}^n $$

Hence if $\mathbf{b}\notin C(A)$ and $\mathbf{b}\notin N\left(A^T\right)$, then we know that

$$ \mathbf{b} = \mathbf{p} + \mathbf{r}, $$

where $\mathbf{p}\in C(A)$, and $\mathbf{r}\in N\left(A^T\right)$. This yields

$$ P\mathbf{b} = P\left(\mathbf{p} + \mathbf{r}\right) = P\mathbf{p} + P\mathbf{r} = \mathbf{p} + \mathbf{0} = \mathbf{p} \\ $$

Additionally we have

$$ \begin{align*} \mathbf{r} &= \mathbf{b} - \mathbf{p} \\ \\ (I - P)\mathbf{b} &= I\mathbf{b} - P\mathbf{p} = \mathbf{b} - \mathbf{p} = \mathbf{r} \end{align*} $$

Definition¶

If matrix $A\in\mathbb{R}^{m\times n}$ and $P\in\mathbb{R}^{m\times m}$ is the projection matrix onto $C(A)$, then the matrix $I-P$ is the projection from $\mathbb{R}^m$ onto $N\left(A^T\right)$.

Generally, if we have an orthonormal basis $Q=\left\{\mathbf{q}_1,\ \mathbf{q}_2,\ \dots,\ \mathbf{q}_k\right\}$ for a subspace $S$ in $m$-dimensional inner product space $V$, then we know from an earlier theorem that

$$ Q^TQ = \begin{bmatrix} \delta_{ij} \end{bmatrix} = I_k $$

However what about $QQ^T\in\mathbb{R}^{m\times m}$? We know that if $Q\in\mathbb{R}^{n\times n}$ is square, then $Q$ is an orthogonal matrix and $QQ^T = Q^TQ = I_n$. What if $m\neq k$? Since the columns of matrix $Q$ is an orthonormal set,

$$ P = Q\left(Q^TQ\right)^{-1}Q^T = QI_kQ^T = QQ^T $$

Definition¶

If $\mathfrak{B}=\left\{\mathbf{q}_1,\ \mathbf{q}_2,\ \dots,\ \mathbf{q}_k\right\}$ is an orthonormal basis for a subspace $S$ $m$-dimensional inner product space $V$, then the orthogonal projection of $\mathbb{R}^m$ onto $S$ is given by

$$ P = QQ^T $$

where $Q = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \dots & \mathbf{q}_k \end{bmatrix}$.

Notice that $I-UU^T$ is the projection vector of $\mathbb{R}^m$ onto a subspace $U$ so that $U\oplus S=\mathbb{R}^m$.

Definition¶

If $S$ is a subspace of inner product space $V$, then $U = \left\{ \mathbf{u}\,:\, \mathbf{u}\text{ is orthogonal to every vector in }S \right\}$ is called the orthogonal complement of $S$ and denoted

$$ \begin{align*} S^{\perp} &= \left\{ \mathbf{u}\,:\, \mathbf{u}\text{ is orthogonal to every vector in }S \right\} \\ \\ &= \left\{ \mathbf{u}\in V\,:\,\langle\mathbf{u},\mathbf{w}\rangle=0,\ \forall\ \mathbf{w}\in S \right\} \end{align*} $$

Furthermore, $S\oplus S^{\perp}=V$.

Lemma¶

If $S$ is a subspace of finite dimensional inner product space $V$ with orthogonal projection $P$ from $V$ onto $S$, then $I-P$ is an orthogonal projection of $V$ onto $S^{\perp}$.

5.4.5 Least Squares Solutions of Overdetermined Systems¶

One of the most common application problems encountered for linear algebra is least squares problem of finding a solution to an overdetermined linear system. Overdetermined systems are ones with more equations than unknowns, so there is too much data for the problem often leading to inconsistent systems. In these cases, we have the matrix equation

$$ A\mathbf{x} = \mathbf{b} $$

with $A\in\mathbb{R}^{m\times n}$ and $m \gt n$. Since this system is typically inconsistent, we will not in general be able to find $\mathbf{x}\in\mathbb{R}^n$ that solves $A\mathbf{x}=\mathbf{b}$, but we can find the least squares solution $\hat{\mathbf{x}}$ that is the closest vector to being the solution to the system. The way we will measure the "closeness" of $A\hat{\mathbf{x}}$ to being equal to $\mathbf{b}$ is by defining a residual

$$ r(\mathbf{x}) = \mathbf{b} - A\mathbf{x} $$

and determine the distance between vectors $\mathbf{b}$ and $A\mathbf{x}$ by computing the norm of the residual

$$ \|r(\mathbf{x})\| = \left\| \mathbf{b} - A\mathbf{x} \right\| $$

The least squares solution $\hat{\mathbf{x}}$ of $A\mathbf{x} = \mathbf{b}$ will be the vector $\mathbf{x}\in\mathbb{R}^n$ that minimizes the norm of the residual $\|r(\mathbf{x})\|$. For convenience, we note that minimizing $\|r(\mathbf{x})\|$ is equivalent to minimizing $\|r(\mathbf{x})\|^2$, so our theorems will focus on minimizing the latter.

Theorem 5.4.1¶

The Distance from a Vector to a Subspace

Let $S$ be a subspace of $\mathbb{R}^m$. For each $\mathbf{b}\in\mathbb{R}^m$, there is a unique element $\mathbf{p}\in S$ which is closest to $\mathbf{b}$; that is

$$ \left\| \mathbf{b} - \mathbf{y} \right\| \gt \left\| \mathbf{b} - \mathbf{p} \right\| $$

for any $\mathbf{y}\neq\mathbf{p}$ in $S$. Additionally, $\mathbf{p}\in S$ will be closest to a given vector $\mathbf{b}\in\mathbb{R}^m$ if an only if $\mathbf{b}-\mathbf{p}\in S^\perp$.

The distance $d = \left\|\mathbf{b} - \mathbf{p}\right\|$ is called the distance from vector $\mathbf{p}$ to subspace $S$.

Proof:¶

$(\Leftarrow)$ Since $S$ is a subspace of $\mathbb{R}^m$, we can write $\mathbb{R}^m$ as a direct sum of $S$ and its orthogonal complement $S^\perp$; $\mathbb{R}^m = S\oplus S^\perp$. Therefore each element $\mathbf{b}\in\mathbb{R}^m$ may be expressed uniquely as a sum

$$ \mathbf{b} = \mathbf{p} + \mathbf{z} $$

where $\mathbf{p}\in S$ and $\mathbf{z}\in S^\perp$. If $\mathbf{y} \neq \mathbf{p}$ is in $S$, then

$$ \left\| \mathbf{b} - \mathbf{y} \right\|^2 = \left\| (\mathbf{b} - \mathbf{p}) + (\mathbf{p} - \mathbf{y}) \right\|^2 $$

Vector $\mathbf{p} - \mathbf{y} \in S$ and vector $\mathbf{b} - \mathbf{p}\in S^\perp$ are orthogonal, so by the Pythagorean law the above equation becomes

$$ \left\| \mathbf{b} - \mathbf{y} \right\|^2 = \left\| \mathbf{b} - \mathbf{p}\right\|^2 + \left\| \mathbf{p} - \mathbf{y} \right\|^2 $$

Norms are always positive, so we remove one of the right hand terms to produce the following inequality:

$$ \left\| \mathbf{b} - \mathbf{y} \right\| \gt \left\| \mathbf{b} - \mathbf{p}\right\| $$

This gives that, for $\mathbf{p}\in S$ and $\mathbf{b}-\mathbf{p}\in S^\perp$, $\mathbf{p}$ is the element of the subspace $S$ closest to $\mathbf{b}$.

$(\Rightarrow)$ Let $\mathbf{q}\in S$ and $\mathbf{b}-\mathbf{q}\notin S^\perp$, then $\mathbf{q}\neq \mathbf{p}$ and repeating the previous argument with $\mathbf{y} = \mathbf{q}$ yields

$$ \left\| \mathbf{b} - \mathbf{q} \right\| \gt \left\| \mathbf{b} - \mathbf{p}\right\| $$ $\tombstone$

Finally we have a result for linear systems $A\mathbf{x}=\mathbf{b}$ when matrix $A$ has full rank (rank$(A) = n$).

Theorem 5.4.2¶

The Null Space of $A^T A$ is the Same as the Null Space of $A$.

$$ N\left(A^T A\right) = N(A) $$

Proof:¶

$(\Rightarrow)$ Suppose $\mathbf{x}\in N\left(A^T A\right)$, then

$$ \mathbf{0} = A^T A\mathbf{x} = A^T\left(A\mathbf{x}\right) $$

Hence $A\mathbf{x}\in N(A^T)$ as well as $C(A)$. Therefore $A\mathbf{x}=\mathbf{0}$ since $C(A)\cap N(A^T)=\left\{\mathbf{0}\right\}$. Thus $\mathbf{x}\in N(A)$ and $N(A^TA)\subset N(A)$.

$(\Leftarrow)$ Suppose $\mathbf{x}\in N(A)$, then $A\mathbf{x} = \mathbf{0}$ and

$$ A^T A\mathbf{x} = A^T\left(A\mathbf{x}\right) = A^T\mathbf{0} = \mathbf{0} $$

So $\mathbf{x}\in N\left(A^T A\right)$, $N(A)\subset N(A^TA)$.

As each null space is a subset of the other, they are equal.$\tombstone$

Theorem 5.4.3¶

If $A\mathbf{x}=\mathbf{b}$ is an over determined linear system with full rank, then the normal equation yields a unique solution to the Least Squares Problem. That is,

if $A\in\mathbb{R}^{m\times n}$ and $\,\text{rank}(A)=n$, then the normal equation

$$ A^T A\mathbf{x} = A^T\mathbf{b} $$

has a unique solution

$$ \hat{\mathbf{x}} = \left(A^T A\right)^{-1}A^T\mathbf{b} $$

and $\, \hat{\mathbf{x}}$ uniquely solves the least squares problem $A\mathbf{x} = \mathbf{b}$.

Proof:¶

Since $m\times n$ matrix $A$ has rank $n$, $N(A)$ is trivial. Hence $N(A^TA)=N(A)=\left\{\mathbf{0}\right\}$. Thus square matrix $A^TA$ is nonsingular and every linear system

$$ A^TA\hat{\mathbf{x}} = \mathbf{b} $$

has a unique solution $\hat{\mathbf{x}} = \left(A^TA\right)^{-1}\mathbf{b}$ for every vector $\mathbf{b}\in\mathbb{R}^m$.$\tombstone$

5.4.6 Finding the Least Squares Solution¶

Normally one solves the least squares problem by solving the normal equation for the linear system. Dr. Strang demonstrates this technique in $Video Lecture detailing solving the least squares problem.$ Projection Matrices and Least Squares

One computes the projection matrix only when a linear system must be solved for several data points from the codomain. For example, one might measure several physical quantities and approximate a desired result from the least squares solution.

However in some cases, one need only solve the least squares problem for one vector in the codomain. In this case, one generally solves the normal equation. If the original linear system has full rank, then the least squares solution will be unique. If $N(A)$ is nontrivial, then $N(A^TA)$ will be nontrivial also, and one obtains infinitely many solutions.

5.4.7 Least Squares Solution with QR¶

We can use the $QR$ factorization to compute the least squares solution of an inconsistent linear system $A\mathbf{x}=\mathbf{b}$.

If matrix $A\in\mathbb{R}^{m\times n}$ has full rank, then the corollary above implies that the least squares solution will be unique. However if matrix $A$ has a free column, then matrix $R$ will be singular, and the normal equation will have infinitely solutions.

In order to find the least squares solution to inconsistent linear system $A\mathbf{x} = \mathbf{b}$ using the $QR$ factorization. If we substitute $QR$ for $A$ in the normal equation

$$ \begin{align*} A^TA\hat{\mathbf{x}} &= A^T\mathbf{b} \\ \\ (QR)^TQR\hat{\mathbf{x}} &= (QR)^T\mathbf{b} \\ \\ R^TQ^TQR\hat{\mathbf{x}} &= R^TQ^T\mathbf{b} \\ \\ R^TR\hat{\mathbf{x}} &= R^TQ^T\mathbf{b} \end{align*} $$

Theorem 5.4.4¶

If $A\in\mathbb{R}^{m\times n}$ is an $m\times n$ matrix of rank $n$, then the least squares solutions of

$$ A\mathbf{x} = \mathbf{b} $$

is given by the solutions to the normal equation

$$ A^TA\hat{\mathbf{x}} = A^T\mathbf{b} $$

If matrix $m\times n$ matrix $A$ has factorization $A=QR$ where $m\times n$ matrix $Q$ has orthonormal columns, and $n\times n$ $R$ is upper triangular, then the normal equation takes the form

$$ R\hat{\mathbf{x}} = Q^T\mathbf{b} $$

If matrix $A$ has full rank, then matrix $R$ is nonsingular and the least squares problem has a unique solution

$$ \hat{\mathbf{x}} = R^{-1}Q^T\mathbf{b} $$

Proof:¶

The first statement we must prove is that the normal equation is consistent. Recall that the direct sum of the column space of matrix $A$ and the null space of $A^T$ is all of the codomain $\mathbb{R}^m$.

$$ C(A^T) \oplus N(A) = \mathbb{R}^n $$

We already proved that $N(A) = N(A^TA)$, and we know that $A^TA$ is symmetric. Hence

$$ \mathbb{R}^n = C\left(\left(A^TA\right)^T\right) \oplus N(A^TA) = C(A^TA) \oplus N(A^TA) = C(A^TA) \oplus N(A) $$

For a finite dimensional inner product space $\mathbb{R}^n$, this means that $C(A^T) = C(A^TA)$. Thus $A^T\mathbf{b}\in C(A^TA)$ and the linear system

$$ A^TA\hat{\mathbf{x}} = A^T\mathbf{b} $$

is consistent. Replacing matrix $A$ with the product $QR$ from theorem 5.3b.2 yields the still consistent linear system

$$ \begin{align*} \left(QR\right)^T\left(QR\right)\hat{\mathbf{x}} &= \left(QR\right)^T\mathbf{b} \\ \\ R^TQ^TQR \hat{\mathbf{x}} &= R^TQ^T\mathbf{b} \\ \\ R^TR\hat{\mathbf{x}} &= R^TQ^T\mathbf{b} \end{align*} $$

Recall that our least squares solutions minimize the norm of the residual vector $\left\|\mathbf{r}\right\|=\left\|\mathbf{b}-A\hat{\mathbf{x}}\right\|$. This occurs when $\mathbf{r}\in N(A^T)$. Since matrices $A$ and $Q$ have the same column space, they also have the same left null space. That is $N(Q^T)=N(A^T)$. Hence,

$$ \begin{align*} \mathbf{b} - A\hat{\mathbf{x}} &\in N(Q^T) \\ \\ Q^T\left( \mathbf{b} - A\hat{\mathbf{x}} \right) &= \mathbf{0} \\ \\ Q^T\mathbf{b} - Q^TQR\hat{\mathbf{x}} &= \mathbf{0} \\ \\ R\hat{\mathbf{x}} &= Q^T\mathbf{b} \end{align*} $$

This establishes the $QR$ version of the normal equation for any linear system.

If our original matrix $A$ has full rank, then $N(A^TA)=N(A)=\left\{\mathbf{0}\right\}$. In this case square matrices $A^TA$ and $R$ and invertible. Therefore,

$$ \begin{align*} R^{-1}R\hat{\mathbf{x}} &= R^{-1}Q^T\mathbf{b} \\ \\ \hat{\mathbf{x}} &= R^{-1}Q^T\mathbf{b} \end{align*} $$

and the normal equation has a unique solution. $\tombstone$

5.4.8 Lines of Best Fit¶

The context where you have likely encountered least squares problems before is finding a line of best fit for a data set.

$$ \begin{array}{c|r|r|r|r|r|r} x & x_1 & x_2 & \dots & x_j & \dots & x_m \\ \hline y & y_1 & y_2 & \dots & y_j & \dots & y_m \end{array} $$

Given a set of values $\left\{(x_i,y_i)\,:\,1\le j\le m\right\}$, we seek to find a line in the form $y = a + bx$ that best represents the data set. To do this, we make a system equations out of the set

$$ \begin{align*} a + bx_1 &= y_1 \\ a + bx_2 &= y_2 \\ \vdots \\ a + bx_j &= y_j \\ \vdots \\ a + bx_m &= y_m \\ \end{align*} $$

for all $i=1,2,\ldots,m$. This yields an $m\times 2$ inconsistent linear system

$$ A\mathbf{c} = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \ddots & \ddots \\ 1 & x_j \\ \ddots & \ddots \\ 1 & x_m \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_j \\ \vdots \\ y_m \end{bmatrix} = \mathbf{y}, $$

where $A$ is the linear operator, $\mathbf{c}$ is the vector of coefficients $\begin{bmatrix} a \\ b \end{bmatrix}$, and $\mathbf{y}$ is the vector of outputs $y_j$ for each $x_j$ in our table. Notice that vector $\mathbf{c}$ is NOT a vector in $\mathbb{R}^2$; it is a vector in $P_2$, the vector space of polynomials of degree less than 2. The first coordinate in vector $\mathbf{c}$ is the $y$-intercept and the second coordinate is the slope.

One solves the least squares problem by obtaining the normal equation. That is multiply both sides of the equation by $A^T$

$$ A^TA\mathbf{c} = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_m \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} = A^T\mathbf{y} $$

Notice that

$A^TA$ is a symmetric matrix, $\left(A^TA\right)^T = A^T\left(A^T\right)^T = A^TA$.
$N(A) = \left\{\mathbf{0}\right\}$ so $N\left(A^TA\right) = \left\{\mathbf{0}\right\}$ if at least one of the $x_j$'s is different from the others.
$A^TA\mathbf{c} = A^T\mathbf{y}$ has a unique least squares solution if at least one of the $x_j$'s is different from the others.
We will see higher dimensional applications. When $N(A)$ is non-trivial, then $N\left(A^TA\right) = N(A)$ will also be non-trivial and one obtains infinitely many least squares solutions.

For cases of lines of best fit that should pass through the origin¶

If you know that your least squares line must pass through the origin, that is the $y$-intercept must be zero, one obtains the linear system

$$ \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}\begin{bmatrix} a \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$

$$ \left(\displaystyle\sum_{i=1}^m x_i^2\right)a = \begin{bmatrix} x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix}\begin{bmatrix} a \end{bmatrix} = \begin{bmatrix} x_1 & x_2 & \ldots & x_m \end{bmatrix}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} = \displaystyle\sum_{i=1}^m x_iy_i $$

Example 3¶

A common laboratory exercise for physics students is using Hooke's Law to determine the spring constant of a spring experimentally. The law states that the force exerted by a spring is proportional to and opposite in direction of its displacement from its equilibrium (resting, not stretched or compressed) length and is given by

$$ F = -kx $$

where $F$ is the force exerted by the spring, $x$ is the displacement, and $k$ is a constant of proportionality called the spring constant.

Suppose some freshmen physics students have compiled the following data

Displacement $x$ (m)	Force $F(x)$ (N)
0.020 m	4.1 N
0.030 m	6.0 N
0.040 m	7.9 N
0.050 m	9.8 N

To determine $k$, we can use positive values for simplicity and construct the normal equations

$$ \begin{bmatrix} 0.02 & 0.03 & 0.04 & 0.05 \end{bmatrix}\begin{bmatrix} 0.02 \\ 0.03 \\ 0.04 \\ 0.05 \end{bmatrix}\begin{bmatrix} k \end{bmatrix} = \begin{bmatrix} 0.02 & 0.03 & 0.04 & 0.05 \end{bmatrix}\begin{bmatrix} 4.1 \\ 6.0 \\ 7.9 \\ 9.8 \end{bmatrix} $$

This simplifies to

$$ 0.0054k = 1.068 $$

so $k\approx 197.\bar{7}\ \text{N}/\text{m}$ and those students should report $k = 2.0\times 10^2\ \text{N}/\text{m}$ for the appropriate number of significant figures. Line of best fit for Hooke's Law

Figure 5.3.2

Example 4¶

Lines of best fit may also be used to fit market data for economics purposes. Suppose data was collected showing the demand for gasoline at different costs per gallon. The data for gas prices

Price per gallon $p$ ($)	Demand $x(p)$ (in millions of gallons)
$2.75	500
$2.50	620
$2.25	710
$2.00	810
$1.75	930
$1.50	1050

Here it is not reasonable to assume that $b=0$ since there is likely a minimum gas price based on the cost of producing and delivering the gasoline. First one looks at the data

Line of best fit for gas price-demand equation

Figure 5.3.3

Next we decide how to model our data. In this case we decide that like Hooke's Law there is a first order (line) that best fits the data points. If we can find such a line

$$ x = a + bp, $$

then we would have

$$ \begin{align*} a + 2.75b &= 500 \\ a + 2.50b &= 620 \\ a + 2.25b &= 710 \\ a + 2.00b &= 810 \\ a + 1.75b &= 930 \\ a + 1.50b &= 1050 \end{align*} $$

So we will construct our inconsistent system of equations using the relationship $A\mathbf{c}=\mathbf{d}$, where

$$ \begin{align*} \mathbf{c} &= \begin{bmatrix} a \\ b \end{bmatrix}\text{ is the vector of coefficients for our line,} \\ \mathbf{p} &=\text{ the vector of gas prices, and } \\ \mathbf{d} &=\text{ the vector of resulting sales in millions of gallons (the demand).} \end{align*} $$

Notice that vector $\mathbf{c}$ is not a vector in $\mathbb{R}^2$; it is a vector in $P_2$, $\mathbf{c}\in P_2$. The polynomial in $P_2$, the line we seek is the line $x(p) = a + bp$ with slope $b$ and $y$-intercept $a$ that best fits the data points in our table.

$$ A\mathbf{c} = \begin{bmatrix} 1 & 2.75 \\ 1 & 2.50 \\ 1 & 2.25 \\ 1 & 2.00 \\ 1 & 1.75 \\ 1 & 1.50 \end{bmatrix}\begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 500 \\ 620 \\ 710 \\ 810 \\ 930 \\ 1050 \end{bmatrix} = \mathbf{d} $$

The solution is

$$ \hat{\mathbf{x}} = \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 1688 \\ -432 \end{bmatrix} $$

Follow Along

First one creates the __normal equation__ by multiplying both sides of the equation by $A^T$.
$$ \begin{align*} A^TA\mathbf{c} &= A^T\mathbf{d} \\ \\ A^TA &= \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 2.75 & 2.50 & 2.25 & 2.00 & 1.75 & 1.50 \end{bmatrix}\begin{bmatrix} 1 & 2.75 \\ 1 & 2.50 \\ 1 & 2.25 \\ 1 & 2.00 \\ 1 & 1.75 \\ 1 & 1.50 \end{bmatrix} = \begin{bmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{bmatrix} \\ \\ A^T\mathbf{c} &= \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 2.75 & 2.50 & 2.25 & 2.00 & 1.75 & 1.50 \end{bmatrix}\begin{bmatrix} 500 \\ 620 \\ 710 \\ 810 \\ 930 \\ 1050 \end{bmatrix} = \begin{bmatrix} 4620 \\ 9345 \end{bmatrix} \end{align*} $$
Notice that

$A^TA$ is a symmetric matrix, $\left(A^TA\right)^T = A^T\left(A^T\right)^T = A^TA$.
$N(A) = \left\{\mathbf{0}\right\}$ so $N\left(A^TA\right) = \left\{\mathbf{0}\right\}$.

Now we have the linear system
$$ A^TA\mathbf{c} = \begin{bmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{bmatrix} = \begin{bmatrix} 4620 \\ 9345 \end{bmatrix} = A^T\mathbf{d} \\ $$
Let us solve this linear system using Cramer's Rule.
$$ \begin{align*} a &= \dfrac{\begin{vmatrix} 4620 & 12.7500 \\ 9345 & 28.1875 \end{vmatrix}}{\begin{vmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{vmatrix}} = \dfrac{11078}{6.5625} \approx 1688 \\ \\ b &= \dfrac{\begin{vmatrix} 6.0000 & 4620 \\ 12.7500 & 9345 \end{vmatrix}}{\begin{vmatrix} 6.0000 & 12.7500 \\ 12.7500 & 28.1875 \end{vmatrix}} = \dfrac{ -2835}{6.5625} \approx -432 \end{align*} $$
The linear least squares line in figure 5.3.3 is given by $x(p) = 1688 - 432\, p$.

5.4.9 Polynomial Interpolation¶

Another application of the least squares problem is to find a polynomial that represents a data set. Suppose that we have a data set $\left\{(x_i,y_i)\right\}$ composed of $n+1$ values. We may, by solving a particular least squares problem, find polynomial $\mathbf{p}$ of degree $n$ or less, $\mathbf{p}\in P_{n+1}$, that passes through each of these points. This polynomial would allow us to interpolate the data set, that is talk about values between the specific data points in $\left\{(x_i,y_i)\right\}$. The resulting polynomial $\mathbf{p}\in P_{n+1}$ is called the interpolating polynomial.

It is important to be careful when doing these kinds of interpolations. The best representation of $n+1$ data points will not necessarily be polynomial of degree $n$. In fact, this is rarely the case. We tend to use a low order polynomial. Look at Example 1 and Example 2. These are both actually polynomial interpolation problems, we just choose to use a line (or technically affine function) which is a polynomial of degree $1$.

To construct the least squares problem for polynomial interpolation, we use the Vandermonde matrix

$$ V = \begin{bmatrix} 1 & x_1 & x_1^2 & \ldots & x_1^n \\ 1 & x_2 & x_2^2 & \ldots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \ldots & x_m^n \end{bmatrix} $$

Each $x_j$ comes from the data set, and the matrix $V$ takes the place of $A$ in the normal equations for polynomial interpolation.

Example 5¶

We will do two least squares problems on the same data set, finding a degree $2$ and a degree $5$ interpolating polynomial. Consider the data set

$$ \begin{array}{c|r|r|r|r|r|r|r|r|r} x & -1.95 & 0.22 & 0.88 & 1.16 & 1.25 & 4.48 & 4.91 & 5.96 & 7.16 \\\hline y & -1.96 & -1.76 & -1.62 & -1.25 & -0.51 & 2.08 & 3.24 & 3.53 & 3.71 \end{array} $$

For the first problem, we interpolate the data set using the polynomial $ c_0 + c_1 x + c_2 x^2 $, finding the least squares solution of the system

$$ \begin{bmatrix} 1 & x_1 & x_1^2 & \ldots & x_1^n \\ 1 & x_2 & x_2^2 & \ldots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \ldots & x_m^n \end{bmatrix}\begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$

For the second version, we use the polynomial $ c_0 + c_1 x + c_2 x^2 + c_3 x^3 + c_4 x^4 + c_5 x^5 $ whose system is very close in form

$$ \begin{bmatrix} 1 & x_1 & x_1^2 & \ldots & x_1^n \\ 1 & x_2 & x_2^2 & \ldots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \ldots & x_m^n \end{bmatrix}\begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ c_3 \\ c_4 \\ c_5 \end{bmatrix} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{bmatrix} $$

We solve these and plot the solutions on the same axis for comparison purposes. A degree 2 and degree 5 interpolating polynomial for a given data set. For this data set, the degree 5 polynomial is a superior estimate, since it does a better job of modeling the edge behavior of the data.

Figure 5.3.4

The polynomial of degree 3 that best fits the data

$$ \mathbf{c} = \begin{bmatrix} -1.5077 \\ 0.5909 \\ 0.0347 \end{bmatrix} $$
$$ y = 0.0345x^2 + 0.5909x - 1.5077 $$

Follow Along

$$ \begin{align*} V\mathbf{c} = \begin{bmatrix} 1.0000 & -1.9500 & 3.8025 \\ 1.0000 & 0.2200 & 0.0484 \\ 1.0000 & 0.8800 & 0.7744 \\ 1.0000 & 1.1600 & 1.3456 \\ 1.0000 & 1.2500 & 1.5625 \\ 1.0000 & 4.4800 & 20.0704 \\ 1.0000 & 4.9100 & 24.1081 \\ 1.0000 & 5.9600 & 35.5216 \\ 1.0000 & 7.1600 & 51.656 \\ \end{bmatrix}\mathbf{c} &= \begin{bmatrix} -1.9600 \\ -1.7600 \\ -1.6200 \\ -1.2500 \\ -0.5100 \\ 2.0800 \\ 3.2400 \\ 3.5300 \\ 3.7100 \end{bmatrix} = \mathbf{y} \\ \\ V^TV\mathbf{c} = \begin{bmatrix} 9.00 & 2.41 & 13.85 \\ 24.10 & 13.85 & 78.38 \\ 13.85 & 78.38 & 4893.3 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 5.4600 \\ 72.7509 \\ 424.1718 \end{bmatrix} = V^T\mathbf{y} \end{align*} $$

The polynomial of degree 5 that best fits the data

$$ \mathbf{c} = \begin{bmatrix} -1.9948 \\ 0.6358 \\ 0.2021 \\ -0.0509 \\ 0.0078 \\ -0.0006 \end{bmatrix} $$
$$ y = -0.0006x^5 + 0.0078x^4 - 0.0509x^3 + 0.2021x^2 + 0.6358x - 1.9948 $$

Follow Along

$$ \begin{align*} V\mathbf{c} = \begin{bmatrix} 1 & -1.95 & 3.8025 & -7.4149 & 14.5 & -28.195 \\ 1 & 0.22 & 0.0484 & 0.0106 & .00234 & 0.0005 \\ 1 & 0.88 & 0.7744 & 0.6815 & 0.5970 & 0.5277 \\ 1 & 1.16 & 1.3456 & 1.5609 & 1.8016 & 2.1003 \\ 1 & 1.25 & 1.5625 & 1.9531 & 2.4414 & 3.0518 \\ 1 & 4.48 & 20.070 & 89.915 & 402.82 & 1804.6 \\ 1 & 4.91 & 24.108 & 118.37 & 581.20 & 2853.7 \\ 1 & 5.96 & 35.522 & 211.70 & 1261.8 & 7520.2 \\ 1 & 7.16 & 51.266 & 367.06 & 2628.2 & 18817 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} -1.9600 \\ -1.7600 \\ -1.6200 \\ -1.2500 \\ -0.5100 \\ 2.0800 \\ 3.2400 \\ 3.5300 \\ 3.7100 \end{bmatrix} = \mathbf{y} \\ \\ V^TV\mathbf{c} = \begin{bmatrix} 9 & 24.07 & 138.50 & 783.85 & 4893.3 & 30974 \\ 24.07 & 138.50 & 783.85 & 4893.3 & 30974 & 201713 \\ 138.50 & 783.85 & 4893.3 & 30974 & 201713 & 1336746 \\ 783.85 & 4893.3 & 30974 & 201713 & 1336746 & 8999611 \\ 4893.3 & 30974 & 201713 & 1336746 & 8999611 & 61329825 \\ 30974 & 201713 & 1336746 & 8999611 & 61329825 & 422058506 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 5.4600 \\ 72.751 \\ 424.17 \\ 2690.14 \\ 16892 \\ 10941 \end{bmatrix} = V^T\mathbf{y} \end{align*} $$

5.4.10 Other Types of Least Squares Problems¶

Least squared problems come in many forms, and we can use other equations we are familiar with to construct them. For example, suppose we have a data set $\left\{(x_i,y_i)\,:\,1\le i\le m\right\}$ where the points roughly represent a circle. We can use the equation of a circle

$$ (x-c_1)^2 + (y-c_2)^2 = r^2 $$

with our data points $\left\{(x_i,y_i)\right\}$ plugged in allow for the construction of a least squares problem if we do a little bit of algebra

$$ \begin{align*} (x-c_1)^2 + (y-c_2)^2 &= r^2 \\ \\ x^2 -2xc_1 + c_1^2 + y^2 - 2yc_2 + c_2^2 &= r^2 \\ \\ x^2 + y^2 &= 2xc_1 + 2yc_2 + \left(r^2 - c_1^2 - c_2^2\right) \end{align*} $$

By setting $ c_3 = r^2 - c_1^2 - c_2^2 $, we can form a linear system based on

$$ 2xc_1 + 2yc_2 + c_3 = x^2 + y^2 $$

which looks like

$$ A\mathbf{c} = \begin{bmatrix} 2x_1 & 2y_1 & 1 \\ 2x_2 & 2y_2 & 1 \\ \vdots & \vdots & \vdots \\ 2x_m & 2y_m & 1 \end{bmatrix}\begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = \begin{bmatrix} x_1^2 + y_1^2 \\ x_2^2 + y_2^2 \\ \vdots \\ x_m^2 + y_m^2 \end{bmatrix} = \mathbf{b} $$

The least squares solution will give us $\mathbf{c} = \left[ c_1\ c_2\ c_3 \right]^T$, we interpret $(c_1,c_2)$ as the center of the circle, and

$$ r = \sqrt{c_1^2 + c_2^2 + c_3} $$

yields the radius.

If we use the data set

$$ \begin{array}{c|r|r|r|r|r|r|r|r|r|r} x & 8.03 & 6.86 & 5.99 & 5.83 & 4.73 & 4.02 & 4.81 & 5.41 & 5.71 & 7.77 \\\hline y & 3.06 & 4.48 & 4.98 & 5.04 & 4.54 & 2.74 & 1.37 & 1.13 & 1.02 & 2.01 \end{array} $$

and solve the least squared problem, we have the information required to determine the center and radius of the approximating circle. The plot of the circle and the data is

A roughly circular set of data points is shown with a circle found by solving a least squares problem featuring the data set under the assumption that the data set is circular.

Figure 5.3.5

The least squares solution

$$ \mathbf{c} = \begin{bmatrix} 5.9855 \\ 2.9739 \\ -40.75257 \end{bmatrix} $$
The solution curve, or least squares circle has center $\approx (5.9855, 2.9739)$, and radius $$ r \approx \sqrt{ 5.9855^2 + 2.9739^2 - 40.75257 } \approx 1.9793 $$

Follow Along

$$ \begin{align*} A\mathbf{c} = \begin{bmatrix} 16.0600 & 6.1200 & 1 \\ 13.7200 & 8.9600 & 1 \\ 11.9800 & 9.9600 & 1 \\ 11.6600 & 10.0800 & 1 \\ 9.4600 & 9.0800 & 1 \\ 8.0400 & 5.4800 & 1 \\ 9.6200 & 2.7400 & 1 \\ 10.8200 & 2.2600 & 1 \\ 11.4200 & 2.0400 & 1 \\ 15.5400 & 4.0200 & 1 \\ \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 73.845 \\ 67.130 \\ 60.681 \\ 59.391 \\ 42.985 \\ 23.668 \\ 25.013 \\ 30.545 \\ 33.645 \\ 64.413 \end{bmatrix} = \mathbf{b} \\ \\ A^TA\mathbf{c} = \begin{bmatrix} 1461.3 & 724.61 & 118.32 \\ 724.6 & 463.96 & 60.74 \\ 118.3 & 60.74 & 10 \end{bmatrix}\mathbf{c} &= \begin{bmatrix} 6079.66 \\ 3241.59 \\ 481.31 \end{bmatrix} = A^T\mathbf{b} \end{align*} $$

5.4.11 Exercises¶

Exercise 1¶

Consider the following matrix $$ A = \begin{bmatrix}\ 1\ &\ \ 0\ & -3\ \\ \ 1\ & -4\ &\ \ 5\ \\ \ 1\ &\ \ 0\ &\ \ 2\ \\ \ 1\ & -4\ &\ \ 0\ \end{bmatrix} $$

Find the least squares solution to

$$ A\mathbf{x} = \begin{bmatrix}\ \ 4\ \\ -12\ \\ -1\ \\ -7\ \end{bmatrix} $$

using the $QR$ factorization.

Check Your Work

$$ \begin{align*} Q &= \begin{bmatrix}\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ & -\frac{1}{2}\ \\ \ \ \frac{1}{2}\ & -\frac{1}{2}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ & -\frac{1}{2}\ & -\frac{1}{2}\ \end{bmatrix} \\ \\ R &= \begin{bmatrix}\ \ 2\ & -4\ &\ \ 2\ \\ \ \ 0\ &\ \ 4\ & -3\ \\ \ \ 0\ &\ \ 0\ &\ \ 5\ \end{bmatrix} \\ \\ \hat{\mathbf{x}} &= \begin{bmatrix}\ \ 1\ \\ \ \ 2\ \\ -1\ \end{bmatrix} \end{align*} $$

Follow Along

$$ \begin{align*} r_{11} &= \sqrt{ 1^2 + 1^2 + 1^2 + 1^2 } = 2 \\ \\ \mathbf{q}_1 &= \frac{\mathbf{a}_1}{\left\|\mathbf{a}_1\right\|} = \frac{\mathbf{a}_1}{r_{11}} = \frac{1}{2}\begin{bmatrix}\ 1\ \\ \ 1\ \\ \ 1\ \\ \ 1\ \end{bmatrix} \\ \mathbf{r}_1 &= \begin{bmatrix}\ 2\ \\ \ 0\ \\ \ 0\ \end{bmatrix} \\ \\ \mathbf{p}_2 &= \text{Proj}_{\mathbf{q}_1}\mathbf{a}_2 = \frac{\langle\mathbf{q}_1,\mathbf{a}_2\rangle}{\langle\mathbf{q_1},\mathbf{q}_1\rangle}\mathbf{q}_1 = \frac{r_{12}}{1}\mathbf{q}_1 = \left(0\cdot\frac{1}{2} - 4\cdot\frac{1}{2} + 0\cdot\frac{1}{2} - 4\cdot\frac{1}{2} \right)\begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{2}\ \end{bmatrix} = \begin{bmatrix} -2\ \\ -2\ \\ -2\ \\ -2\ \end{bmatrix} \\ \\ \mathbf{a}_2 - \mathbf{p}_2 &= \begin{bmatrix}\ \ 0\ \\ -4\ \\ \ \ 0\ \\ -4\ \end{bmatrix} - \begin{bmatrix} -2\ \\ -2\ \\ -2\ \\ -2\ \end{bmatrix} = \begin{bmatrix}\ \ 2\ \\ -2\ \\ \ \ 2\ \\ -2\ \end{bmatrix} \\ \\ \mathbf{q}_2 &= \frac{\mathbf{a}_2 - \mathbf{p}_2}{\left\|\mathbf{a}_2 - \mathbf{p}_2\right\|} = \frac{\mathbf{a}_2 - \mathbf{p}_2}{r_{22}} = \frac{1}{4}\begin{bmatrix}\ \ 2\ \\ -2\ \\ \ \ 2\ \\ -2\ \end{bmatrix} = \begin{bmatrix}\ \ \frac{1}{2}\ \\ -\frac{1}{2}\ \\ \ \ \frac{1}{2}\ \\ -\frac{1}{2}\ \end{bmatrix} \\ \mathbf{r}_2 &= \begin{bmatrix} -4\ \\ \ \ 4\ \\ \ \ 0\ \end{bmatrix} \\ \\ \mathbf{p}_3 &= \text{Proj}_{\mathbf{q}_1}\mathbf{a}_3 + \text{Proj}_{\mathbf{q}_2}\mathbf{a}_3 = \frac{\langle\mathbf{q}_1,\mathbf{a}_3\rangle}{\langle\mathbf{q_1},\mathbf{q}_1\rangle}\mathbf{q}_1 + \frac{\langle\mathbf{q}_2,\mathbf{a}_3\rangle}{\langle\mathbf{q_2},\mathbf{q}_2\rangle}\mathbf{q}_2 = r_{13}\mathbf{q_1} + r_{23}\mathbf{q}_2 \\ &= \left( -3\cdot\frac{1}{2} + 5\cdot\frac{1}{2} + 2\cdot\frac{1}{2} + 0\cdot\frac{1}{2} \right)\begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{2}\ \end{bmatrix} + \left( -3\cdot\frac{1}{2} - 5\cdot\frac{1}{2} + 2\cdot\frac{1}{2} - 0\cdot\frac{1}{2} \right)\begin{bmatrix}\ \ \frac{1}{2}\ \\ -\frac{1}{2}\ \\ \ \ \frac{1}{2}\ \\ -\frac{1}{2}\ \end{bmatrix} \\ &= \begin{bmatrix}\ 1\ \\ \ 1\ \\ \ 1\ \\ \ 1\ \end{bmatrix} + \begin{bmatrix} -\frac{3}{2}\ \\ \ \ \frac{3}{2}\ \\ -\frac{3}{2}\ \\ \ \ \frac{3}{2}\ \end{bmatrix} = \begin{bmatrix} -\frac{1}{2}\ \\ \ \ \frac{5}{2}\ \\ -\frac{1}{2}\ \\ \ \ \frac{5}{2}\ \end{bmatrix} \\ \\ \mathbf{a}_3-\mathbf{p}_3 &= \begin{bmatrix} -3\ \\ \ \ 5\ \\ \ \ 2\ \\ \ \ 0\ \end{bmatrix} - \begin{bmatrix} -\frac{1}{2}\ \\ \ \ \frac{5}{2}\ \\ -\frac{1}{2}\ \\ \ \ \frac{5}{2}\ \end{bmatrix} = \begin{bmatrix} -\frac{5}{2}\ \\ \ \ \frac{5}{2}\ \\ \ \ \frac{5}{2}\ \\ -\frac{5}{2}\ \end{bmatrix} \\ \\ \mathbf{q}_3 &= \frac{\mathbf{a}_3-\mathbf{p}_3}{\left\|\mathbf{a}_3-\mathbf{p}_3\right\|} = \frac{\mathbf{a}_3-\mathbf{p}_3}{r_{33}} = \frac{1}{5}\begin{bmatrix} -\frac{5}{2}\ \\ \ \ \frac{5}{2}\ \\ \ \ \frac{5}{2}\ \\ -\frac{5}{2}\ \end{bmatrix} = \begin{bmatrix} -\frac{1}{2}\ \\ \ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ \\ -\frac{1}{2}\ \end{bmatrix} \\ \mathbf{r}_3 &= \begin{bmatrix}\ \ 2\ \\ -3\ \\ \ \ 5\ \end{bmatrix} \\ \\ Q &= \begin{bmatrix}\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ & -\frac{1}{2}\ \\ \ \ \frac{1}{2}\ & -\frac{1}{2}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ & -\frac{1}{2}\ & -\frac{1}{2}\ \end{bmatrix} \\ \\ R &= \begin{bmatrix}\ \ 2\ & -4\ &\ \ 2\ \\ \ \ 0\ &\ \ 4\ & -3\ \\ \ \ 0\ &\ \ 0\ &\ \ 5\ \end{bmatrix} \\ \end{align*} $$
Now that we have the $QR$ factorization of matrix $A$ and we know that $R$ is nonsingular we have that $$ \begin{align*} \hat{\mathbf{x}} &= R^{-1}Q^T\mathbf{b} \\ \\ \begin{bmatrix}\ \ 2\ & -4\ &\ \ 2 & | & 1\ &\ \ 0\ &\ \ 0\ \\ \ \ 0\ &\ \ 4\ & -3 & | & 0\ &\ \ 1\ &\ \ 0\ \\ \ \ 0\ &\ \ 0\ &\ \ 5 & | & 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\begin{array}{l} R_1+R_2 \\ \\ \frac{1}{5}R_3 \end{array} &\rightarrow \begin{bmatrix}\ \ 2\ &\ \ 0\ & -1 & | & 1\ &\ \ 1\ &\ \ 0\ \\ \ \ 0\ &\ \ 4\ & -3 & | & 0\ &\ \ 1\ &\ \ 0\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ \frac{1}{5}\ \end{bmatrix}\begin{array}{l} R_1+R_3 \\ R_2+3R_3 \\ \\ \end{array} \\ &\rightarrow \begin{bmatrix}\ \ 2\ &\ \ 0\ &\ \ 0 & | & 1\ &\ \ 1\ &\ \ \frac{1}{5}\ \\ \ \ 0\ &\ \ 4\ &\ \ 0 & | & 0\ &\ \ 1\ &\ \ \frac{3}{5}\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ \frac{1}{5}\ \end{bmatrix}\begin{array}{l} \frac{1}{2}R_1 \\ \frac{1}{4}R_2 \\ \\ \end{array} \\ &\rightarrow \begin{bmatrix}\ \ 1\ &\ \ 0\ &\ \ 0 & | & \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{10}\ \\ \ \ 0\ &\ \ 1\ &\ \ 0 & | & 0\ &\ \ \frac{1}{4}\ &\ \ \frac{3}{20}\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ \frac{1}{5}\ \end{bmatrix} \\ \\ \hat{\mathbf{x}} &= \begin{bmatrix}\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{10}\ \\ \ \ 0\ &\ \ \frac{1}{4}\ &\ \ \frac{3}{20}\ \\ \ \ 0\ &\ \ 0\ &\ \ \frac{1}{5}\ \end{bmatrix}\begin{bmatrix}\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{1}{2}\ & -\frac{1}{2}\ &\ \ \frac{1}{2}\ & -\frac{1}{2}\ \\ -\frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ & -\frac{1}{2}\ \end{bmatrix}\begin{bmatrix}\ \ 4\ \\ -12\ \\ -1\ \\ -7\ \end{bmatrix} \\ \\ &= \begin{bmatrix}\ \ \frac{1}{2}\ &\ \ \frac{1}{2}\ &\ \ \frac{1}{10}\ \\ \ \ 0\ &\ \ \frac{1}{4}\ &\ \ \frac{3}{20}\ \\ \ \ 0\ &\ \ 0\ &\ \ \frac{1}{5}\ \end{bmatrix}\begin{bmatrix} -8\ \\ \ \ 11\ \\ -5\ \end{bmatrix} = \begin{bmatrix}\ \ 1\ \\ \ \ 2\ \\ -1\ \end{bmatrix} \end{align*} $$

Exercise 2¶

Consider the following matrix $$ A = \begin{bmatrix}\ \ 3\ &\ \ 0\ & -2\ \\ \ \ 3\ & \ \ 8\ &\ \ 5\ \\ \ \ 3\ &\ \ 4\ &\ \ 0\ \\ \ \ 0\ & -4\ & -2\ \end{bmatrix} $$

Find the least squares solution to

$$ A\mathbf{x} = \begin{bmatrix} -1\ \\ \ \ 5\ \\ -1\ \\ \ \ 2\ \end{bmatrix} $$

using the $QR$ factorization.

Check Your Work

$$ \begin{align*} Q &= \begin{bmatrix}\ \ \frac{1}{\sqrt{3}}\ & -\frac{1}{\sqrt{3}}\ &\ \ 0\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ 0\ & -\frac{1}{\sqrt{3}}\ \\ \ \ 0\ & -\frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \end{bmatrix} \\ \\ R &= \begin{bmatrix}\ \ 3\sqrt{3}\ &\ \ 4\sqrt{3}\ &\ \ \sqrt{3}\ \\ \ \ 0\ &\ \ 4\sqrt{3}\ & 3\sqrt{3}\ \\ \ \ 0\ &\ \ 0\ &\ \ \sqrt{3}\ \end{bmatrix} \\ \\ \hat{\mathbf{x}} &= \begin{bmatrix}\ \ \frac{5}{3}\ \\ -\frac{5}{3}\ \\ \ \ \frac{8}{3}\ \end{bmatrix} \end{align*} $$

Follow Along

$$ \begin{align*} r_{11} &= \sqrt{ (-3)^2 + (-3)^2 + (-3)^2 + 0^2 } = 3\sqrt{3} \\ \\ \mathbf{q}_1 &= \frac{\mathbf{a}_1}{\left\|\mathbf{a}_1\right\|} = \frac{\mathbf{a}_1}{r_{11}} = \frac{1}{\sqrt{3}}\begin{bmatrix}\ 1\ \\ \ 1\ \\ \ 1\ \\ \ 0\ \end{bmatrix} \\ \mathbf{r}_1 &= \begin{bmatrix}\ 3\sqrt{3}\ \\ \ 0\ \\ \ 0\ \end{bmatrix} \\ \\ \mathbf{p}_2 &= \text{Proj}_{\mathbf{q}_1}\mathbf{a}_2 = \frac{\langle\mathbf{q}_1,\mathbf{a}_2\rangle}{\langle\mathbf{q_1},\mathbf{q}_1\rangle}\mathbf{q}_1 = \frac{r_{12}}{1}\mathbf{q}_1 = \left(0\cdot\frac{1}{\sqrt{3}} + 8\cdot\frac{1}{\sqrt{3}} + 4\cdot\frac{1}{\sqrt{3}} - 4\cdot 0 \right)\begin{bmatrix}\ \frac{1}{\sqrt{3}}\ \\ \ \frac{1}{\sqrt{3}}\ \\ \ \frac{1}{\sqrt{3}}\ \\ \ 0\ \end{bmatrix} = \begin{bmatrix}\ 4\ \\ \ 4\ \\ \ 4\ \\ \ 0\ \end{bmatrix} \\ \\ \mathbf{a}_2 - \mathbf{p}_2 &= \begin{bmatrix}\ \ 0\ \\ \ \ 8\ \\ \ \ 4\ \\ -4\ \end{bmatrix} - \begin{bmatrix}\ 4\ \\ \ 4\ \\ \ 4\ \\ \ 0\ \end{bmatrix} = \begin{bmatrix} -4\ \\ \ \ 4\ \\ \ \ 0\ \\ -4\ \end{bmatrix} \\ \\ \mathbf{q}_2 &= \frac{\mathbf{a}_2 - \mathbf{p}_2}{\left\|\mathbf{a}_2 - \mathbf{p}_2\right\|} = \frac{\mathbf{a}_2 - \mathbf{p}_2}{r_{22}} = \frac{1}{4\sqrt{3}}\begin{bmatrix} -4\ \\ \ \ 4\ \\ \ \ 0\ \\ -4\ \end{bmatrix} = \begin{bmatrix} -\frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ \\ \ \ 0\ \\ -\frac{1}{\sqrt{3}}\ \end{bmatrix} \\ \mathbf{r}_2 &= \begin{bmatrix}\ \ 4\sqrt{3}\ \\ \ \ 4\sqrt{3}\ \\ \ \ 0\ \end{bmatrix} \\ \\ \mathbf{p}_3 &= \text{Proj}_{\mathbf{q}_1}\mathbf{a}_3 + \text{Proj}_{\mathbf{q}_2}\mathbf{a}_3 = \frac{\langle\mathbf{q}_1,\mathbf{a}_3\rangle}{\langle\mathbf{q_1},\mathbf{q}_1\rangle}\mathbf{q}_1 + \frac{\langle\mathbf{q}_2,\mathbf{a}_3\rangle}{\langle\mathbf{q_2},\mathbf{q}_2\rangle}\mathbf{q}_2 = r_{13}\mathbf{q_1} + r_{23}\mathbf{q}_2 \\ &= \left( -2\cdot\frac{1}{\sqrt{3}} + 5\cdot\frac{1}{\sqrt{3}} + 0\cdot\frac{1}{\sqrt{3}} - 2\cdot 0 \right)\begin{bmatrix}\ \frac{1}{\sqrt{3}}\ \\ \ \frac{1}{\sqrt{3}}\ \\ \ \frac{1}{\sqrt{3}}\ \\ \ 0 \end{bmatrix} + \left( -2\left(-\frac{1}{\sqrt{3}}\right) + 5\cdot\frac{1}{\sqrt{3}} + 0\cdot\frac{1}{\sqrt{3}} - 2\left(-\frac{1}{\sqrt{3}}\right) \right)\begin{bmatrix} -\frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ \\ \ \ 0\ \\ -\frac{1}{\sqrt{3}}\ \end{bmatrix} \\ &= \begin{bmatrix}\ 1\ \\ \ 1\ \\ \ 1\ \\ \ 0\ \end{bmatrix} + \begin{bmatrix} -3\ \\ \ \ 3\ \\ \ \ 0\ \\ -3\ \end{bmatrix} = \begin{bmatrix} -2\ \\ \ \ 4\ \\ \ \ 1\ \\ -3\ \end{bmatrix} \\ \\ \mathbf{a}_3-\mathbf{p}_3 &= \begin{bmatrix} -2\ \\ \ \ 5\ \\ \ \ 0\ \\ -2\ \end{bmatrix} - \begin{bmatrix} -2 \\ \ \ 4\ \\ \ \ 1\ \\ -3\ \end{bmatrix} = \begin{bmatrix}\ \ 0\ \\ \ \ 1\ \\ -1\ \\ \ \ 1\ \end{bmatrix} \\ \\ \mathbf{q}_3 &= \frac{\mathbf{a}_3-\mathbf{p}_3}{\left\|\mathbf{a}_3-\mathbf{p}_3\right\|} = \frac{\mathbf{a}_3-\mathbf{p}_3}{r_{33}} = \frac{1}{\sqrt{3}}\begin{bmatrix}\ \ 0\ \\ \ \ 1\ \\ -1\ \\ \ \ 1\ \end{bmatrix} = \begin{bmatrix}\ \ 0\ \\ \ \ \frac{1}{\sqrt{3}}\ \\ -\frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ \end{bmatrix} \\ \mathbf{r}_3 &= \begin{bmatrix}\ \ \sqrt{3}\ \\ \ \ 3\sqrt{3}\ \\ \ \ \sqrt{3}\ \end{bmatrix} \\ \\ Q &= \begin{bmatrix}\ \ \frac{1}{\sqrt{3}}\ & -\frac{1}{\sqrt{3}}\ &\ \ 0\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ 0\ & -\frac{1}{\sqrt{3}}\ \\ \ \ 0\ & -\frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \end{bmatrix} \\ \\ R &= \begin{bmatrix}\ \ 3\sqrt{3}\ &\ \ 4\sqrt{3}\ &\ \ \sqrt{3}\ \\ \ \ 0\ &\ \ 4\sqrt{3}\ & 3\sqrt{3}\ \\ \ \ 0\ &\ \ 0\ &\ \ \sqrt{3}\ \end{bmatrix} \\ \\ QR &= \begin{bmatrix}\ \ \frac{1}{\sqrt{3}}\ & -\frac{1}{\sqrt{3}}\ &\ \ 0\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \\ \ \ \frac{1}{\sqrt{3}}\ &\ \ 0\ & -\frac{1}{\sqrt{3}}\ \\ \ \ 0\ & -\frac{1}{\sqrt{3}}\ &\ \ \frac{1}{\sqrt{3}}\ \end{bmatrix}\begin{bmatrix}\ \ 3\sqrt{3}\ &\ \ 4\sqrt{3}\ &\ \ \sqrt{3}\ \\ \ \ 0\ &\ \ 4\sqrt{3}\ & 3\sqrt{3}\ \\ \ \ 0\ &\ \ 0\ &\ \ \sqrt{3}\ \end{bmatrix} \\ \\ &= \frac{1}{\sqrt{3}}\begin{bmatrix}\ \ 1\ & -1\ &\ \ 0\ \\ \ \ 1\ &\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 0\ & -1\ \\ \ \ 0\ & -1\ &\ \ 1\ \end{bmatrix}\begin{bmatrix}\ \ 3\ &\ \ 4\ &\ \ 1\ \\ \ \ 0\ &\ \ 4\ &\ \ 3\ \\ \ \ 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\sqrt{3} = \begin{bmatrix}\ \ 3\ &\ \ 0\ & -2\ \\ \ \ 3\ &\ \ 8\ &\ \ 5\ \\ \ \ 3\ &\ \ 4\ &\ \ 0\ \\ \ \ 0\ & -4\ & -2\ \end{bmatrix} \end{align*} $$ Now that we have the $QR$ factorization of matrix $A$ and we know that $R$ is nonsingular we have that $$ \begin{align*} \hat{\mathbf{x}} &= R^{-1}Q^T\mathbf{b} \\ \\ \begin{bmatrix}\ \ 3\ &\ \ 4\ &\ \ 1 & | & 1\ &\ \ 0\ &\ \ 0\ \\ \ \ 0\ &\ \ 4\ &\ \ 3 & | & 0\ &\ \ 1\ &\ \ 0\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\begin{array}{l} R_1-R_3 \\ R_2-3R_3 \\ \\ \end{array} &\rightarrow \begin{bmatrix}\ \ 3\ &\ \ 4\ &\ \ 0 & | & 1\ &\ \ 0\ & -1\ \\ \ \ 0\ &\ \ 4\ &\ \ 0 & | & 0\ &\ \ 1\ & -3\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\begin{array}{l} R_1-R_2 \\ \\ \\ \end{array} \\ \\ \begin{bmatrix}\ \ 3\ &\ \ 0\ &\ \ 0 & | & 1\ & -1\ &\ \ 2\ \\ \ \ 0\ &\ \ 4\ &\ \ 0 & | & 0\ &\ \ 1\ & -3\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\begin{array}{l} \frac{1}{3}R_1 \\ \frac{1}{4}R_2 \\ \\ \end{array} &\rightarrow \begin{bmatrix}\ \ 1\ &\ \ 0\ &\ \ 0 & | & \frac{1}{3}\ & -\frac{1}{3}\ &\ \ \frac{2}{3}\ \\ \ \ 0\ &\ \ 1\ &\ \ 0 & | & 0\ &\ \ \frac{1}{4}\ & -\frac{3}{4}\ \\ \ \ 0\ &\ \ 0\ &\ \ 1 & | & 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix} \\ \\ \hat{\mathbf{x}} &= \frac{1}{\sqrt{3}}\begin{bmatrix}\ \ \frac{1}{3}\ & -\frac{1}{3}\ &\ \ \frac{2}{3} \\ \ \ 0\ &\ \ \frac{1}{4}\ & -\frac{3}{4}\ \\ \ \ 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\,\frac{1}{\sqrt{3}}\begin{bmatrix}\ \ 1\ &\ \ 1\ &\ \ 1\ &\ \ 0\ \\ -1\ &\ \ 1\ &\ \ 0\ & -1\ \\ \ \ 0\ &\ \ 1\ & -1\ &\ \ 1\ \end{bmatrix}\,\begin{bmatrix} -1\ \\ \ \ 5\ \\ -1\ \\ \ \ 2\ \end{bmatrix} \\ \\ &= \frac{1}{3}\begin{bmatrix}\ \ \frac{1}{3}\ & -\frac{1}{3}\ &\ \ \frac{2}{3} \\ \ \ 0\ &\ \ \frac{1}{4}\ & -\frac{3}{4}\ \\ \ \ 0\ &\ \ 0\ &\ \ 1\ \end{bmatrix}\,\begin{bmatrix}\ \ 3\ \\ \ \ 4\ \\ \ \ 8\ \end{bmatrix} = \frac{1}{3}\begin{bmatrix}\ \ 5\ \\ -5\ \\ \ \ 8\ \end{bmatrix} = \begin{bmatrix}\ \ \frac{5}{3}\ \\ -\frac{5}{3}\ \\ \ \ \frac{8}{3}\ \end{bmatrix} \end{align*} $$

Your use of this self-initiated mediated course material is subject to our $An international nonprofit organization that empowers people to grow and sustain the thriving commons of shared knowledge and culture.$ Creative Commons License 4.0