The Revised Simplex Method

Revised Simplex Method, version 1
- Recipe
Revised Simplex Method, version 2
- Recipe
Invertibility of \(A_B\)

The way we’ve performing the Simplex Method so far is by writing a full dictionary at each step, but this is potentially wasteful: the matrix formulas for the dictionary tells us that knowing the basic variables is enough to reconstruct the whole dictionary and we don’t even need all of the dictionary to figure out what pivot to perform, and thus to figure out what the next basis will be. For example, we never need to know the current value of the objective function. From the \(z\)-row we only need the coefficients of non-basic variables, to pick the entering variable, and then to pick the exiting variables we only need two columns of the dictionary: we need the values of the basic variables in the solution associated to the dictionary and we need the coefficients of the exiting variable in the formulas for the basic variables.

For example in the following dictionary we don’t need any of the question marks to figure out that \(x_2\) should enter and \(x_6\) should exit:

\[\begin{aligned} x_3 & = 3 + {}? x_1 - 4 x_2 + {}? x_5 \\ x_4 & = 1 + {}? x_1 + 2 x_2 + {}? x_5 \\ x_6 & = 2 + {}? x_1 - 3 x_2 + {}? x_5 \\ z & = {}? - 2 x_1 + 4 x_2 + 2 x_5 \end{aligned}\]

The idea of the Revised Simplex Method is to avoid having to compute the full dictionary after every pivot. Instead, we’ll only keep track of some of the information, including the current basis, and use the matrix formulas to compute the portions of the dictionary we need.

We’ll describe two versions of the Revised Simplex Method: one where we only keep track of the current basis variables, the current solution and compute everything else (to avoid computing \(A_B^{-1}\), we’ll solve systems of equations instead), and one where we additionally keep track of \(A_B^{-1}\).

Revised Simplex Method, version 1

In this version, we keep track only of the basis \(\mathbf{x}_B\) and the current solution¹, \(\mathbf{x}_B^* = A_B^{-1} \mathbf{b}\). To deal with any occurrences of \(A_B^{-1}\) in the formulas, we’ll solve systems of equations. When we say we “keep track” of some data, it means that when we start a pivoting step we will have that data available for the current basis, and also that we must compute that data for the next basis. Throughout we will need to refer to the matrix formulas for the dictionary:

\[\begin{aligned} \mathbf{x}_B & = A_B^{-1} \mathbf{b} - A_B^{-1} A_N \mathbf{x}_N \\ z & = \mathbf{c}_B^\top A_B^{-1} \mathbf{b} + \left(\mathbf{c}_N^\top - \mathbf{c}_B^\top A_B^{-1} A_N \right) \mathbf{x}_N \end{aligned}\]

We’re given \(\mathbf{x}_B\) and the numbers \(\mathbf{x}_B^*\). Now we must determine the entering variable. For that we need to calculate the coefficients of the non-basic variables in the \(z\)-row, i.e., \(\mathbf{c}_N^\top - \mathbf{c}_B^\top A_B^{-1} A_N\). All of the pieces, \(\mathbf{c}_B\), \(\mathbf{c}_N\), \(A_B\) and \(A_N\), can be found from the original LP (or more, precisely, from the augmented \(\mathbf{c}^\mathsf{aug}\) and \(A^\mathsf{aug}\) you get after adding the slack variables). But we’d like to avoid calculating \(A_B^{-1}\), so we’ll get \(\mathbf{y}^\top := \mathbf{c}_B^\top A_B^{-1}\) by solving the system of equations \( A_B^\top \mathbf{y} = \mathbf{c}_B\).

Once we have \(\mathbf{y}\) we can compute the vector of coefficients of \(\mathbf{x}_N\) in the \(z\)-row as \(\mathbf{c}_N^\top - \mathbf{y}^\top A_N\). We can then following the standard pivoting rules to choose the entering variable: we look for the largest positive coefficient, and in case of ties pick the one corresponding to the non-basic variable \(x_j\) with smallest \(j\).

To choose the entering variable, we need \(\mathbf{x}_B^*\) and the vector of coefficients of \(x_j\) in the top portion of the dictionary (the \(\mathbf{x}_B\) part). From the matrix formulation of the dictionary, this vector is \(-A_B^{-1} A_j\) (recall that \(A_j\) is the column of \(A^\mathsf{aug}\) corresponding to the variable \(x_j\)). Again, we avoid inverting \(A_B\) by solving a system of equations, namely, we get \(\mathbf{d} := A_B^{-1} A_j\) by solving \(A_B \mathbf{d} = A_j\).

Once we have \(\mathbf{d}\), we can choose the exiting variable according to the standard pivoting rules: \(x_j\) starts at \(0\) and is allowed to increase as long as all the \(\mathbf{x}_B\) stay \(\ge 0\). So, the value of \(x_j\) in the solution associated to the next basis (in which \(x_j\) will be basic!), is given by \(\max\{ t \ge 0: \mathbf{x}_B^* - t \mathbf{d} \ge 0\}\). Of course, there may be no maximum, that is, it might happen that \(\mathbf{x}_B^* - t \mathbf{d} \ge 0\) for all \(t \ge 0\); if so, that means the LP is unbounded and that \(\mathbf{x}_N = 0\), \(\mathbf{x}_B = \mathbf{x}_B^* - t \mathbf{d}\) is an unbounded family of feasible solutions.

If there is a maximum value of \(t\), one or more of the entries of \(\mathbf{x}_B^* - t \mathbf{d}\) becomes \(0\) (when \(t\) has the maximum value). Each entry of that vector corresponds to a basic variable; and we choose the exiting variable \(x_i\) to be the one whose entry becomes \(0\) —or, if there are several, the one with smallest \(i\).

It only remains to make sure we have the next-basis versions of all the data we gave ourselves. The new \(\mathbf{x}_B\) is, of course, just obtained from the previous one by removing the exiting \(x_i\) and putting in the entering \(x_j\). The new associated solution requires is also pretty easy: as we said above, the value \(x_j^*\) of \(x_j\) in the next dictionary is that maximum \(t\), and the new values of \(\mathbf{x}_B\) are given by \(\mathbf{x}_B^* - t \mathbf{d}\) (again, for that maximum \(t\)). Note that the vector \(\mathbf{x}_B^* - t \mathbf{d}\) includes a zero in the entry corresponding to the exiting variable \(x_i\), all the other entries correspond to variables that are still basic in the next dictionary.

Recipe

Summarizing the above discussion, these are the steps you take given the list of basic variables \(\mathbf{x}_B\) and their values in the associated solution, \(\mathbf{x}_B^*\):

Find \(\mathbf{y}\) by solving \(A_B^\top \mathbf{y} = \mathbf{c}_B\).
Compute \(\mathbf{c}_N^\top - \mathbf{y}^\top A_N\), which is the vector of coefficients in the \(z\)-row. Use it to choose the entering variable \(x_j\) by looking for the largest positive entry. If all entries of the coefficient vector are non-positive, then stop: you have reached optimality!
Find \(\mathbf{d}\) by solving \(A_B \mathbf{d} = A_j\).
Find the maximum value of \(t\) such that \(\mathbf{x}_B^* - t \mathbf{d} \ge 0\). From now on, let \(t\) be that maximum.
The exiting variable \(x_i\) will be the one (with smallest \(i\), in case of ties) for which the corresponding entry of \(\mathbf{x}_B^* - t \mathbf{d}\) is zero.
Find the values of the basic variables in the next dictionary:
- For \(x_j\) the value will be that maximum \(t\).
- The other basic variables are the current ones, \(\mathbf{x}_b\), but without \(x_i\). Their values are \(\mathbf{x}_B^* - t \mathbf{d}\) —this will include the value zero for \(x_i\), which won’t be basic anymore.

Revised Simplex Method, version 2

In this version, we additionally keep track of the matrix \(A_B^{-1}\). If we have \(A_B^{-1}\), the steps where we compute \(\mathbf{y}\) and \(\mathbf{d}\) can be done by simple matrix multiplication instead of solving a linear system. Of course, it’s not all roses: we need to figure out how to compute the next \(A_B^{-1}\), that it is, the matrix \(A_{B'}^{-1}\) where \(B'\) is the next basis.

To write down the \(A_B\) and \(A_{B'}\) matrices you need to pick a specific order for the basic variables. Let’s agree that the basis \(\mathbf{x}_{B'}\) will have the same order as \(\mathbf{x}_B\) with the exiting variable \(x_i\) replaced by the entering variable \(x_j\). With that convention, if \(x_i\) occurs in position \(\ell\) of \(\mathbf{x}_B\), then \(A_{B'}\) and \(A_B\) differ only in the \(\ell\)-th column: \(A_B\) has \(A_i\) as \(\ell\)-th column, while \(A_{B'}\) has \(A_j\).

This means that \(A_{B'} = A_B E\) where \(E\) is the matrix whose \(\ell\)-th column is \(\mathbf{d} = A_B^{-1} A_j\) and otherwise agrees with the identity matrix. Why is that? Well, for any matrices \(M\) and \(N\), if the columns of \(N\) are \(\mathbf{v}_1, \ldots, \mathbf{v}_m\), it follows that the columns of the product \(MN\)are the vectors \(M\mathbf{v}_1, \ldots, M\mathbf{v}_m\). In particular, if the columns of the identity matrix are \(\mathbf{e}_1, \ldots, \mathbf{e}_m\), then \(M\mathbf{e}_i\) is simply the \(i\)-th column of the matrix \(M\). For the matrix \(E\), whose columns are \(\mathbf{e}_1, \ldots, \mathbf{e}_{\ell-1}, \mathbf{d}, \mathbf{e}_{\ell+1}, \ldots, \mathbf{e}_m\) we see that the columns of \(A_B E\) are the same as the columns of \(A_B\) except that the \(\ell\)-th one is \(A_B \mathbf{d} = A_j\); i.e., the columns of \(A_B E\) are exactly those of \(A_{B'}\)!

From this we can easily state the update rule for \(A_B^{-1}\), namely, \(A_{B'}^{-1} = E^{-1} A_B^{-1}\). And this is extremely handy, because \(E^{-1}\) is very easy to calculate: namely, \(E^{-1}\) also agrees with the identity matrix except in the \(\ell\)-th column, which is given by the following recipe: if \(\mathbf{d}^{\top} = \begin{pmatrix} d_1 & d_2 & \ldots & d_m \end{pmatrix} \), then the \(\ell\)-th column of \(E^{-1}\) is given by \[\begin{pmatrix} -d_1/d_l & -d_2/d_l & \ldots & -d_{\ell-1}/d_l & 1/d_l & -d_{\ell+1}/d_l & \ldots & -d_m/d_l \end{pmatrix}^\top.\] Notice that the \(\ell\)-th entry of that vector is the one on the diagonal of the matrix \(E^{-1}\) and does not follow the pattern of the others.

For example, if \(E = \begin{pmatrix} 1&p&0&0\\ 0&q&0&0\\ 0&r&1&0\\ 0&s&0&1\\ \end{pmatrix}\), then \(E^{-1} = \begin{pmatrix} 1&-p/q&0&0\\ 0&1/q&0&0\\ 0&-r/q&1&0\\ 0&-s/q&0&1\\ \end{pmatrix}.\)

This is easy enough to check by multiplying out. But for completeness’s sake let’s prove it:

If \(E\) is a square matrix that only differs from the identity matrix in the \(\ell\)-th column, where \(E\) has instead the column \( \begin{pmatrix} d_1 & d_2 & \ldots & d_m \end{pmatrix}^\top\), then \(E^{-1}\) also only differs from the identity matrix in the \(\ell\)-th column which is \(\begin{pmatrix} -d_1/d_l & -d_2/d_l & \ldots & -d_{\ell-1}/d_l & 1/d_l & -d_{\ell+1}/d_l & \ldots & -d_m/d_l \end{pmatrix}^\top\).

Let \(\mathbf{e}_1, \ldots, \mathbf{e}_m\) be the columns of the identity matrix. We know that the columns of \(E\) are given by \(E \mathbf{e}_k = \mathbf{e}_k\) for \(k \neq l\) and \(E \mathbf{e}_{\ell} = \sum_{k=1}^m d_k \mathbf{e}_k\). Multiplying both sides of these equations by \(E^{-1}\) gives us, first, that \(\mathbf{e}_k = E^{-1} \mathbf{e}_k\) for \(k \neq \ell\), and second, that \(\mathbf{e}_{\ell} = \sum_{k=1}^m d_k E^{-1} \mathbf{e}_k\). So we’ve already obtained that all columns of \(E^{-1}\) other than the \(\ell\)-th are the same as those of the identity matrix. Plugging that into the last equation we get, \(\mathbf{e}_{\ell} = \sum_{k \neq \ell} d_k \mathbf{e}_k + d_{\ell} E^{-1}\mathbf{e}_{\ell}\). Solving, we get that \( E^{-1} \mathbf{e}_{\ell} = \sum_{k \neq \ell} (-d_k/d_{\ell}) \mathbf{e}_k + (1/d_{\ell}) \mathbf{e}_{\ell}\), as desired.

Recipe

To summarize, these are the steps you take given the list of basic variables \(\mathbf{x}_B\), their values \(\mathbf{x}_B^*\) in the associated solution, and the matrix \(A_B^{-1}\):

Compute \(\mathbf{c}_N^\top - \mathbf{c}^\top A_B^{-1} A_N\), which is the vector of coefficients in the \(z\)-row. Use it to choose the entering variable \(x_j\) by looking for the largest positive entry. If all entries of the coefficient vector are non-positive, then stop: you have reached optimality!
Compute \(\mathbf{d} = A_B^{-1} A_j\).
Find the maximum value of \(t\) such that \(\mathbf{x}_B^* - t \mathbf{d} \ge 0\). From now on, let \(t\) be that maximum.
The exiting variable \(x_i\) will be the one (with smallest \(i\), in case of ties) for which the corresponding entry of \(\mathbf{x}_B^* - t \mathbf{d}\) is zero.
Find the values of the basic variables in the next dictionary:
- For \(x_j\) the value will be that maximum \(t\).
- The other basic variables are the current ones, \(\mathbf{x}_b\), but without \(x_i\). Their values are \(\mathbf{x}_B^* - t \mathbf{d}\) —this will include the value zero for \(x_i\), which won’t be basic anymore.
Find the next \(A_B^{-1}\): it is \(E^{-1} A_B^{-1}\), where \(E\) is the matrix that differs from the identity matrix only in the column in the position of the entering/exiting variable, where \(E\) has the vector \(d\).

Invertibility of \(A_B\)

The formula \(A_{B'} = A_B E\) finally let’s us repair a long standing omission: we haven’t actually proved that \(A_B\) is invertible! To be clear, we’re not saying that whenever you pick some columns of \(A^{\mathsf{aug}}\) that form a square matrix that matrix is automatically invertible. What is true is that for any basis arising in the Simplex Method, the corresponding \(A_B\) will be invertible, as required for the matrix formulas for the dictionary to make sense!

At the very first step of the Simplex Method the basic variables are the slack variables and \(A_B\) is an identity matrix, which is, of course, invertible. Now, imagine that for some dictionary \(A_B\) is known to be invertible. Let’s show the next matrix, namely \(A_{B'} = A_B E\), is invertible too. It’s enough to show \(E\) is invertible, since then \(A_{B'}\) is a product of two invertible matrices and thus invertible itself.

So why is \(E\) invertible? Well, we found the inverse already! But for that inverse to make sense, we do need that \(d_{\ell} \neq 0\)². Alternatively, expanding by the \(\ell\)-th row, we see that the determinant of \(E\) is exactly \(d_{\ell}\). At any rate, all we have left to show is that \(d_{\ell} \neq 0\). This comes from the way pivots are chosen: \(d_{\ell}\) is minus the coefficient of the exiting variable \(x_j\) in the formula for the entering variable \(x_i\), so it has to be strictly positive (since the coefficient must be negative, otherwise \(x_i\) wouldn’t be the exiting variable).

The \(\mathbf{x}_B^*\) are the values of the basic variables in the solution associated to the dictionary for the basis \(B\). This determines the entire associated solution, because the non-basic variables have the value \(0\).

A point which was glossed over in the proof of the proposition!

The Revised Simplex Method

Table of Contents

Revised Simplex Method, version 1

Recipe

Revised Simplex Method, version 2

Recipe

Invertibility of \(A_B\)