Suppose we have a large circuit with many nodes (say n=1000). Assuming that the number of voltage sources is small the coefficient matrix has approximately n2=106 elements. Storing these elements as double precision real numbers requires requires 8 bytes for every matrix entry. Therefore just storing the matrix requires 8MB of memory. Of course, today this is not that much of an issue. But, on the other hand, circuits with several 10000 or even 100000 nodes are often simulated, (e.g. in integrated circuit design when a full-chip verification is performed). Such large circuits with n=100000 would require 80GB of memory for storing the coefficient matrix. This is even on today's scale a lot of memory.
Memory requirements grow proportionally with n2. This means that doubling the number of rows quadruples the number of elements. If we consider that according to Moore's law the number of transistors in an integrated circuits doubles every 18 months (albeit we are not sure how long this trend is going to continue) we see that we must wait 36 months (or 3 years) so that the memory capacity can accommodate a circuit twice as large as the circuit which pushes the limits of matrix storage today.
But storage is not the only problem. Note how the number of operations required for solving a linear system of equations (LU decomposition + forward substitution + back-substitution) grows proportionally with n3. As n grows we can expect to hit the barrier imposed by the computational time growing beyond anything reasonable even sooner that we are going to run out of memory.
The question arises: can we circumvent these limitations in any way? The answer is yes. But this is possible only if our matrices have a particular property: most of the matrix elements must be zero. Fortunately this is the case with coefficient matrices of circuits. In almost all circuits most nodes have only a few connections (either electrical or via controlled sources). Therefore every KCL equation depends on a small subset of all unknowns in the system of equations. Consequently most matrix entries are zero. Such matrices are also referred to as sparse matrices. The term dense matrix is used for matrices for which all elements are considered to be nonzero (and thus stored in computer's memory regardless of their value).
When a matrix is sparse we don't have to store all its elements. It suffices to store only the nonzero entries. This brings along certain overhead as we must store not only the value of a nonzero matrix element, but also its position (indices i and j). If we assume this overhead consumes 8 bytes (4 bytes per integer) we see that the break even point for storage requirements is reached for matrices where at least half of the matrix elements are equal to zero. Most sparse matrices that arise from real-world circuits are significantly more sparse.
Sparse matrices have an additional advantage. Performing LU decomposition on a sparse matrix can be much cheaper if performed in a certain way. The number of operations for such an LU decomposition is proportional to the number of nonzero matrix entries.
When performing LU decomposition the result is often stored in the initial input matrix. The subdiagonal elements of the resulting matrix store the subdiagonal elements of matrix L. Elements on the diagonal of the resulting matrix and elements above it correspond to elements of matrix U. Since the diagonal elements of L are always 1 we don't have to store them. Now let us visualize the steps of LU decomposition performed on a sparse matrix. Let 'a', 'l', and 'u' denote nonzero entries of the coefficient matrix, L matrix, and U matrix. Zero entries are denoted by '0'. We start with a sample sparse matrix
After the first step of Gaussian elimination we have
We denote by 'x' an intermediate result. Such intermediate results occur in columns where the row used for eliminating subdiagonal elements has a nonzero element. Some intermediate results create new nonzero entries in the matrix. Note that even if an intermediate result is zero we treat it as a nonzero element. The second step of elimination takes care of subdiagonal elements below the second diagonal element and yields
Finally, the third step (which is the last one) results in
Note that there is no need to perform the fourth step because there are no subdiagonal elements below the fourth diagonal element. Despite initial matrix being sparse the final result is a dense matrix which means that we gained nothing in terms of memory consumption. It is reasonable to expect that LU decomposition will change some zero entries in the matrix to nonzero entries. Every such additional nonzero entry is referred to as fill-in. To keep fill-in as small as possible the pivots for Gaussian elimination must be chosen accordingly. But before we explore pivoting in sparse LU decomposition let us first introduce a more sophisticated pivoting approach.
The pivoting described in previous lecture is also referred to as partial pivoting where for the i-th step of LU decomposition the pivot is chosen among the i (sub)diagonal entries in the i-th column.
A more sophisticated pivoting approach chooses the pivot for the i-th step of LU decomposition in a submatrix between the i-th and the last diagonal element. This approach is deemed complete pivoting. If the pivot is chosen outside i-th column one has to permute the columns of the matrix. Permuting the columns corresponds to permuting the unknowns. Just like row permutations, the column permutations must also be stored. This information is needed when we are solving the system of equations. After the backward substitution is completed the resulting vector must be permuted in the opposite order as we permuted the matrix columns to produce the correct vector of unknowns.
Because the matrix is sparse we don't expect many pivot candidates in the column below the diagonal entry. Due to this partial pivoting is not the best choice for sparse matrices and complete pivoting is usually applied. Let us revisit the last example, but this time let us reverse the order of rows and columns (i.e. we permute rows and columns). Normally this is done during LU decomposition but we will do it in advance to demonstrate the effect of reordering the matrix rows and columns on fill-in. We start LU decomposition with the following matrix
The first step of LU decomposition eliminates the subdiagonal entries in the first column and yields
The second step results in
The last step yields the final result
This time the final result is sparse. Even better, we have no fill-in! The choice of pivot not only affects the numerical error, but also the fill-in created during LU decomposition. Just like in LU decomposition of dense matrices pivoting is generally performed for every LU decomposition step. To choose the optimal pivot in terms of fill-in a simulated LU decomposition run is performed where only the fill-in is considered and no computation takes place.
The pivot for one step of LU decomposition is selected by simulating the fill-ins of all pivot candidates in the first step of LU decomposition. The candidate with the smallest fill-in is then chosen and the corresponding rows and columns are exchanged. If two or more candidates produce the same fill-in the candidate with the largest absolute value in the initial matrix is chosen. Note that due to this fill-ins cannot be chosen for pivots. The fill-ins produced by the LU decomposition step are added to the sparse matrix. (i.e. entries for the new nonzero elements are added with no specific value set at this point).
The proposed approach has two weak spots: fill-ins are not considered as pivot candidates, and secondly, the magnitudes of values in the initial matrix are used for choosing pivots. LU decomposition steps can make some fill-in large (which obviously should make it the pivot, but it cannot be chosen for pivot). The magnitude of the chosen pivot can also be reduced by previous LU decomposition steps too much so that it no longer represents a feasible choice for a pivot.
Note how we didn't consider the magnitudes of pivot candidates (except when a tie in terms of fill-in takes place between two or more candidates). Consequently such a pivoting generally produces bad results in terms of numerical error. To avoid numerical error SPICE has two parameters that reduce the set of pivot candidates. These two parameters are relative pivot tolerance (pivrel) and absolute pivot tolerance (pivtol). The absolute value of a pivot candidate in i-th step of LU decomposition must be greater than pivtol and greater than pivrel times the largest subdiagonal element in the i-th column. The values of pivtol and pivrel are by default set to 10-12 and 10-3, respectively.
Multiple linear systems of equations with a common coefficient matrix can also be solved by first inverting the matrix and then multiplying every right-hand side vector with the inverted matrix. This approach has a major disadvantage. If the matrix is sparse its inverse is usually not. The LU decomposition, however, is sparse if the pivots are chosen in the right way.
The approach we presented (LU decomposition with forward and backward substitution) is a member of the large family of approaches deemed direct methods. The second large family of methods comprises iterative methods. These methods repeatedly apply a (usually simple) algorithm to the system of equations. With each iteration a new approximate solution is obtained which is more accurate than the previous one. After several iterations the quality of the solution becomes good enough and we can stop iterating.