Math 511: Linear Algebra
2.5 Markov Chains
2.5.1 Markov Processes¶
$$ \require{color} \definecolor{brightblue}{rgb}{.267, .298, .812} \definecolor{darkblue}{rgb}{0.0, 0.0, 1.0} \definecolor{palepink}{rgb}{1, .73, .8} \definecolor{softmagenta}{rgb}{.99,.34,.86} \definecolor{blueviolet}{rgb}{.537,.192,.937} \definecolor{jonquil}{rgb}{.949,.792,.098} \definecolor{shockingpink}{rgb}{1, 0, .741} \definecolor{royalblue}{rgb}{0, .341, .914} \definecolor{alien}{rgb}{.529,.914,.067} \definecolor{crimson}{rgb}{1, .094, .271} \def\ihat{\mathbf{\hat{\unicode{x0131}}}} \def\jhat{\mathbf{\hat{\unicode{x0237}}}} \def\khat{\mathrm{\hat{k}}} \def\tombstone{\unicode{x220E}} \def\contradiction{\unicode{x2A33}} $$
Markov Processes and Stochastic processes are where linear algebra and statistics overlap. Markov processes can be found in many modern applications of matrices in the fields of thermodynamics, mechanics, physics, chemistry, economics, finance, signal processing, information theory, speech processing, and data science.
We will look at the basics of stochastic matrices in this course, however you will find many applications of these matrices including page-ranking algorithms.
2.5.2 States of a Population¶
Many applications involving populations or finance are represented by a finite number of conditions, associations, or labels. In mathematics if we have a population of consumers, a population of voters, or a population of fish, then we can associate each member of the population with an attribute or state.
Technically the state $s$ of each element of a population $P$ is the result of a function $f:P\rightarrow S$, where each member of the population $x\in P$ is mapped to a state $s\in S$ in the state space. Generally we are not interested in the function that associates each member of the population with an associated state. Instead we are interested in the per cent of the total population that share the same state.
For example,
- a population of voters may be described as Democratic, Republican, Independent,or affiliated with another political party.
- a population of consumers may purchase an Android$^{\text{TM}}$, Apple$^{\text{TM}}$, or other brand smart phone.
- a population of tuna may be described as Albacore, Allothunnus, Euthynnus, Frigate, or Skipjack.
In each case we want the relative size of each classification of the population. The relative size of the population is the fraction of the total population represented by the classification or state. This can also represent the probability that a member of the population fits into a given category. Let's consider an example.
Example 1¶
Freshmen student's dating prospects can be modeled by a Markov Process. Suppose that students describe their current dating status as single, in a relationship, or an ambiguous state variously termed "it's complicated" , "hookup buddies", "friends with benefits", etc. Each month those percentages change, according to a fixed set of probabilities: a fifth of the freshman who are single form a relationship, a quarter of those in the "it’s complicated" category become single, etc. We can organize those probabilities into a transition matrix, in which each entry gives the probability of a student transitioning from the relationship status described by their column to the relationship status described by their row:
$$ \begin{array}{lccc} & \text{Single} & \text{Relationship} & \text{Complicated} \\ \text{Single} & .30 & .30 & .25 \\ \text{Relationship} & .20 & .60 & .35 \\ \text{Complicated} & .50 & .10 & .40 \end{array} $$
Thus, after one month there is a 30% chance a single student will remain single, a 20% chance they will enter a relationship, and a 50% chance they will enter the “it’s complicated” state. Notice that the sum of each column is 1 do denote that the entire population must end up in one of the finite states after each month or cycle.
These probabilities that a freshman in one relationship category will end up in another category after a finite amount of time (one month) is called a transition probability. Arranging these transition probabilities into a column vector yields the transition vector of transition probabilities for each state. Notice that the sum of the probabilities of each column vector must add up to 1 since we design the categories to cover the whole population.
$$ P = \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix} $$
Together these transition vectors form a transition matrix, Markov matrix, or matrix of transition probabilities, $P$. More generally any matrix whose entries are between 0 and 1, and whose columns add up to 1, is called a stochastic matrix. Thus our matrix $P$ is stochastic.
Suppose at the start of orientation 80% of freshman are single, 10% are in a relationship, and 10% are "it’s complicated". Our state space is represented by a list of three numbers, the percentage of students in each category at the end of orientation. Thus our initial state is give by the state vector or state matrix
$$ \mathbf{x}_0 = \begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} $$
What percentage of students will be in a relationship in a month?
$$ \mathbf{x}_1 = P\mathbf{x}_0 = \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}\begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} = \begin{bmatrix} .295 \\ .255 \\ .45 \end{bmatrix} \\ $$
Approximately $25.5\%$ of the freshman population will be in a relationship one month after orientation.
2.5.3 Markov Chains¶
A Markov Chain is a sequence of states $\left\{ X_0, X_1, X_2, \dots \right\} = \left\{ X_n \right\}$ that are related by the equation $X_{k+1} = PX_k$ for some stochastic matrix $P$.The $n^{\text{th}}$ state of the Markov Chain is given by
$$ X_n = P^nX_0 $$
Exercise 1¶
How many freshman will be in a relationship after 3 months? (Use GeoGebra or WolframAlpha)
View Solution
$$ \begin{align*} \mathbf{x}_3 &= P\left(P\left(P\mathbf{x}_0\right)\right) = P^3\mathbf{x}_0 \\ \\ &= \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}^3\begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} \\ &= \begin{bmatrix} 0.2815 & 0.2875 & 0.284 \\ 0.3975 & 0.4195 & 0.408 \\ 0.3210 & 0.2930 & 0.308 \end{bmatrix}\begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} \\ \\ &\approx \begin{bmatrix} 0.28 \\ 0.40 \\ 0.32 \end{bmatrix} \end{align*} $$
Approximately $41\%$ of freshman will be in a relationship.
Exercise 2¶
How many students will be in a relationship at the beginning of their sophomore year?
View Solution
$$ \begin{align*} \mathbf{x}_12 &= P^{12}\mathbf{x}_0 = \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}^{12} \begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} \\ \\ &= \begin{bmatrix} 0.2847222199195 & 0.2847222240135 & 0.284722221966 \\ 0.4097222153115 & 0.4097222275975 & 0.409722221454 \\ 0.305555564769 & 0.305555548389 & 0.30555555658 \end{bmatrix}\begin{bmatrix} .80 \\ .10 \\ .10 \end{bmatrix} \\ \\ &\approx \begin{bmatrix} 0.28 \\ 0.41 \\ 0.31 \end{bmatrix} \end{align*} $$
2.5.4 Steady State of a Markov Chain¶
Notice that in our first example, the values of the $X_3$ and $X_{12}$ are very close. In fact compute
$$ \mathbf{x}_{13} = P\mathbf{x}_{12} = \begin{bmatrix} 0.2847\dots \\ 0.4097\dots \\ 0.3056 \dots \end{bmatrix} $$
In fact the values of $\mathbf{x}_n$ are converging to
$$ \overline{\mathbf{x}} \approx \begin{bmatrix} 0.284722 \\ 0.409722 \\ 0.305556 \end{bmatrix} $$
This state $\overline{\mathbf{x}}$ is called the steady state of the Markov Chain. It is called a steady state because the next state
$$ P\overline{\mathbf{x}} = \overline{\mathbf{x}} $$
is the same state.
2.5.5 Regular Stochastic Matrices¶
A stochastic matrix $P$ is called regular if and only if all of the entries of some integer power of $P$ are positive. In our example, the stochastic matrix is regular because all of the entries of $P = P^1$ are positive.
Theorem 2.5.1¶
The Markov Chain of a regular matrix
$$ \left\{ X_0, PX_0, P^2X_0, P^3X_0, \dots, P^nX_0, \dots \right\} $$
approaches a steady state
$$ \overline{X} = \displaystyle\lim_{n\rightarrow\infty} P^nX_0 $$
2.5.6 Determining the Steady State¶
We can use some linear algebra and Gauss-Jordan Elimination to find a steady state of a stochastic matrix. If the stochastic matrix is regular, then we can determine the unique steady state as well. We want to solve the for the vector $\overline{\mathbf{x}}$ such that $A\overline{\mathbf{x}}=\overline{\mathbf{x}}$. Thus
$$ \begin{align*} A\overline{\mathbf{x}} - \overline{\mathbf{x}} &= \mathbf{0} \\ \\ A\overline{\mathbf{x}} - I\overline{\mathbf{x}} &= \mathbf{0} \\ \\ (A - I)\overline{\mathbf{x}} &= \mathbf{0} \\ \end{align*} $$
Hence we want to solve the homogeneous equation $(A-I)\overline{\mathbf{x}} = 0$. We can add another equation to this square system of equations since we know that the sum of all of the probabilities in a state matrix or state vector must add up to 1.
$$ x_1 + x_2 + x_3 + \dots + x_n = 1 $$
Let us return to our first example
$$ \begin{align*} \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}\overline{\mathbf{x}} &= \overline{\mathbf{x}} \\ x_1 + x_2 + x_3 &= 1 \\ \\ \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}\overline{\mathbf{x}} - \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\overline{\mathbf{x}} &= \mathbf{0} \\ x_1 + x_2 + x_3 &= 1 \\ \\ \begin{bmatrix} .30-1 & .30 & .25 \\ .20 & .60-1 & .35 \\ .50 & .10 & .40-1 \end{bmatrix}\overline{\mathbf{x}} &= \mathbf{0} \\ x_1 + x_2 + x_3 &= 1 \\ \\ -.70x_1 + .30x_2 + .25x_3 &= 0 \\ .20x_1 - .40x_2 + .35x_3 &= 0 \\ .50x_1 + .10x_2 - .60x_3 &= 0 \\ x_1 +\ \ \ \ \ x_2 +\ \ \ \ \ x_3 &= 1 \end{align*} $$
We create the augmented matrix of our new non-homogeneous linear system and reduce it using Gauss-Jordan elimination.
$$ \begin{align*} \begin{bmatrix} -.70 & .30 & .25 & | & 0 \\ .20 & -.40 & .35 & | & 0 \\ .50 & .10 & -.60 & | & 0 \\ 1 & 1 & 1 & | & 1 \end{bmatrix}\begin{array}{l} 100R_1 \\ 100R_2 \\ 100R_3 \\ \\ \end{array} &\rightarrow\begin{bmatrix} -70 & 30 & 25 & | & 0 \\ 20 & -40 & 35 & | & 0 \\ 50 & 10 & -60 & | & 0 \\ 1 & 1 & 1 & | & 1 \end{bmatrix}\begin{array}{l} R_4 \\ \\ \\ R_1 \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 1 & | & 1 \\ 20 & -40 & 35 & | & 0 \\ 50 & 10 & -60 & | & 0 \\ -70 & 30 & 25 & | & 0 \end{bmatrix}\begin{array}{l} \\ R_2-20R_1 \\ R_3-50R_1 \\ R_4+70R_1 \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 1 & | & 1 \\ 0 & -60 & 15 & | & -20 \\ 0 & -40 & -110 & | & -50 \\ 0 & 100 & 95 & | & 70 \end{bmatrix}\begin{array}{l} \\ R_2-R_3 \\ \\ R_4+R_2 \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 1 & | & 1 \\ 0 & -20 & 125 & | & 30 \\ 0 & -40 & -110 & | & -50 \\ 0 & 40 & 110 & | & 50 \end{bmatrix}\begin{array}{l} \\ \\ R_3-2R_2 \\ R_4+R_3 \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 1 & | & 1 \\ 0 & -20 & 125 & | & 30 \\ 0 & 0 & -360 & | & -110 \\ 0 & 0 & 0 & | & 0 \end{bmatrix}\begin{array}{l} \\ \\ \frac{R_3}{-360} \\ \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 1 & | & 1 \\ 0 & -20 & 125 & | & 30 \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix}\begin{array}{l} R_1-R_3 \\ R_2-125R_3 \\ \\ \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 0 & | & \frac{25}{36} \\ 0 & -20 & 0 & | & \frac{1080}{36}-\frac{1375}{36} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix} \\ \\ &=\begin{bmatrix} 1 & 1 & 0 & | & \frac{25}{36} \\ 0 & -20 & 0 & | & - \frac{295}{36} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix}\begin{array}{l} \\ \frac{R_2}{-20} \\ \\ \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 1 & 0 & | & \frac{25}{36} \\ 0 & 1 & 0 & | & \frac{295}{720} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix}\begin{array}{l} R_1-R_2 \\ \\ \\ \\ \end{array} \\ \\ &\rightarrow\begin{bmatrix} 1 & 0 & 0 & | & \frac{500}{720}-\frac{295}{720} \\ 0 & 1 & 0 & | & \frac{295}{720} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix} \\ &=\begin{bmatrix} 1 & 0 & 0 & | & \frac{205}{720} \\ 0 & 1 & 0 & | & \frac{295}{720} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix} \\ &=\begin{bmatrix} 1 & 0 & 0 & | & \frac{41}{144} \\ 0 & 1 & 0 & | & \frac{59}{144} \\ 0 & 0 & 1 & | & \frac{11}{36} \\ 0 & 0 & 0 & | & 0 \end{bmatrix} \end{align*} $$
Let us check to make sure that $\overline{\mathbf{x}} = \begin{bmatrix} \frac{41}{144} \\ \frac{59}{144} \\ \frac{11}{36} \end{bmatrix}$ is the steady state vector for matrix regular stochastic matrix $P$.
$$ \begin{align*} \frac{41}{144} + \frac{59}{144} + \frac{11}{36} &= \frac{41 + 59 + 44}{144} = \frac{144}{144} = 1 \color{green}{\huge{\checkmark}} \\ \\ \begin{bmatrix} .30 & .30 & .25 \\ .20 & .60 & .35 \\ .50 & .10 & .40 \end{bmatrix}\begin{bmatrix} \frac{41}{144} \\ \frac{59}{144} \\ \frac{11}{36} \end{bmatrix} &= \begin{bmatrix} \frac{1230 + 1770 + 1100}{14400} \\ \frac{820 + 3540 + 1540}{14400} \\ \frac{2050 + 590 + 1760}{14400} \end{bmatrix} = \begin{bmatrix} \frac{4100}{14400} \\ \frac{5900}{14400} \\ \frac{4400}{14400} \end{bmatrix} \color{green}{\huge{\checkmark}} \end{align*} $$
2.5.7 Absorbing Markov Chains¶
An absorbing state is a state that a member of the population remains in once it achieves that state. For example, once you have chicken pox, you are immune and don't get it again. Wait, that wasn't such a good example. In any case, the transition probability vector has a 1 in the entry for the absorbing state and zeros for all other entries. This occurs because there is no transition to any other state from the absorbing state.
Example 2¶
From our textbook example 6 we have the state graph
Notice that state $S_2$ has two directed edges. The in-arrow is a directed edge from $S_3$ to $S_2$ with weight $50\%$. The other directed edge is a self-loop back to $S_2$ with a probability of $100\% = 1.00%$. Although many directed graphs have vertices like $S_2$ with self-loops, this state is distinguished by the fact that the only directed edge from the state is the self-loop.
Vertex or state $S_1$ has two directed edges. One is a self-loop with weight $40\%$ indicating that there is a $40\%$ chance that a member of the population will remain in state $S_1$, and one out-arrow to state $S_3$ with weight $60\%$ indicating that there is a $60\%$ chance that a member of the population will transition to from state $S_1$ to state $S_3$.
State $S_3$ has three directed edges; one in-arrow with weight $60\%$, one self-loop with weight $50\%$, an out-arrow indicating a transition from state $S_3$ to state $S_2$ with weight $50\%$.
When we create the Markov matrix $P$, we use each of the transition probabilities or weights in the directed edges to create the columns of our matrix,
A path of length $k$ in a graph from vertex $u$ to vertex $w$ is a sequence of $k+1$ vertices $p=\left\{\,u=v_0,\ v_1,\ \dots,\ v_k=w\,\right\}$ and $k$ directed edges so that there is and edge from each vertex to the next vertex in the sequence. That is there is a directed edge from $u=v_0$ to $v_1$, from $v_1$ to $v_2$, $\dots$, from $v_{k-1}$ to $v_k=w$.
For example, there is a path of length 1 from $S_1$ to $S_3$, $p = \left\{ S_1, S_3 \right\}$ utilizing the directed edge with weight $60\%$.
There is a path of length 2 from $S_1$ to $S_2$, $p = \left\{ S_1, S_3, S_2 \right\}$ utilizing the directed edge from $S_1$ to $S_2$ with weight $60\%$ followed by the directed edge from $S_3$ to $S_2$ with weight $50\%$. There is no shorter path from $S_1$ to $S_2$ because there is no directed edge in the graph from $S_1$ to $S_2$.
If there is a path from vertex $u$ to vertex $w$ we say that vertex $w$ is reachable by vertex $u$ via path $p$.
Theorem¶
If a Markov matrix has at least one absorbing state, then any state that can reach the absorbing state, will do so in a finite number of transitions.
The number of transitions is not the length of the path. Note that weights of each directed edge indicate that from state $S_1$, there is a $40\%$ chance of staying in state $S_1$ during any transition. This means the expected number of transitions necessary to actually enter state $S_3$ is modeled by a geometric distribution. The Since the probability of the transition is given by $p_{13} = .60$ we have that the expected number of transitions necessary to achieve state $S_3$ from state $S_1$ is $E = \frac{1}{p_{13}} = \frac{1}{.6} = \frac{10}{6} = \frac{5}{3}$, or usually 2 trials. However this is only the likely number of transitions. You could be unlucky and spend the next 100 transitions in state $S_1$. It is just not very likely. This is where statistics, thermodynamics, mechanics, physics, chemistry, economics, finance, signal processing, information theory, speech processing, and data science come into play with linear algebra.
Your use of this self-initiated mediated course material is subject to our Creative Commons License 4.0