Linear Algebra Notes

Notes on some linear algebra topics. Sometimes I find things are just stated as being obvious, for example, the dot product of two orthogonal vectors is zero. Well... why is this? The result was a pretty simple but nonetheless it took a little while to think it through properly. Hence, these notes... they started of as my musings on the "why", even if the "why" might be obvious to the rest of the world!

A lot of these notes are now just based on the amazing Khan Academy linear algebra tutorials. In fact, id say the vast majority are... they are literally just notes on Sal's lectures!

Another resource that is really worth watching is 3 Blue 1 Brown's series "The essence of linear algebra".

Page Contents

Vector Stuff

Vector Spaces


A vector is an element of a vector space. A vector space, also known as a linear space, V, is a set which has addition and scalar multiplication defined for it and satisfies the following for any vectors u, v, and w and scalars c and d:

1 Closed under addition: u+vV
2 Communative: u+v=v+u
3 Associative: (u+v)+w=u+(v+w)
4 Additive identity: 0U,i.e.,u+0=u
5 Inverse: uV,u|u+(u)=0
6 Closed under scalar multiply: cuV
7 Distributive: c(u+v)=cu+cv
8 Distributive: (c+d)u=cu+du
9 Distributive: c(du)=(cd)u
10 Multiply identity: 1u=u

Functions Spaces

First let's properly define a function. If X and Y are sets and we have xX and yY, we can form a set F that consists of ordered pairs (xi,yj). If it is the case that (x1,y1)F(x1,y2)Fy1==y2, then F is a function and we write y=F(x). This is called being "single valued", i.e., each input to the function is unambiguous. Note, F(x) is not a function, it is the value returned by the function: F is the function that defines the mapping from elements in X (the domain) to the elements in Y (the range).

One example of a vector space is the set of real-valued functions of a real variable, defined on the domain [axb]. I.e., we're talking about {f1,f2,,fn} where each fi is a function. In other words, fi:RR with the aforementioned restriction on the domain.

We've said that a vector is any element in a vector space, but usually we thing of this in terms of, say, [x,y]R2. We can also however, given the above, think of functions as vectors: the function-as-a-vector being the "vector" of ordered pairs that make up the mapping between domain and range. So in this case the set of function is still like a set of vectors and with scalar multiplication and vector addition defined appropriately the rules for a vector space still hold!

What Isn't A Vector Space

It sounded a lot like everything was a vector space to me! So I did a quick bit of googling and found the paper "Things that are/aren't vector spaces". It does a really good example of showing how in maths we might build up a model of something and expect it to behave "normally" (i.e., obeys the vector space axioms), but in fact doesn't, in which case it becomes a lot harder to reason about them. This is why vectors spaces are nice... we can reason about them using logic and operations we are very familiar with and feel intuitive.

Linear Combinations

A linear combination of vectors is one that only uses the addition and scalar multiplication of those variables. So, given two vectors, a and b the following are linear combinations: 0a+0b5a2.5b123a+πb This can be generalise to say that for any set of vectors {v1,v2,,vn} for v1,v2,,vnRm that a linear combination of those vectors is c1v1+c2v2++cnvn for ciR.


graph showing span of two vectors in 2D space The set of all linear combinations of a set of vectors {v1,v2,,vn} is called the span of those vectors: span(v1,v2,,vn)={c1v1+c2v2++cnvn|ciR} The span can also be refered to as the generating set of vectors for a subspace.

In the graph to the right there are two vectors a and b. The dotted blue lines show all the possible scalings of the vectors. The vectors span those lines. The green lines show two particular scalings of a added to all the possible scalings of b. One can use this to imagine that if we added all the possible scalings of a to all the possible scalings of b we would reach every coordinate in R2. Thus, we can say that the set of vectors {a,b} spans R2, or span(a,b)=R2.

From this we may be tempted to say that any set of two, 2d, vectors spans R2 but we would be wrong. Take for example a and b=αa. These two variables are co-linear. Because they span the same line, no combination of them can ever move off that line. Therefore they do not span R2.

Lets write the above out a little more thoroughly... a=[a1a2],b=[αa1αa2] The span of these vectors is, by our above definition: span(a,b)={c1[a1a2]+c2[αa1αa2]ciR} Which we can re-write as: span(a,b)={c1[a1a2]αc2[a1a2]ciR}={(c1αc2)[a1a2]ciR} ... which we can clearly see is just the set of all scaled vectors of a. Plainly, any co-linear set of 2d vectors will not span R2.

Linear Independence

We saw above that the set of vectors {a,αa} are co-linear and thus did not span R2. The reason for this is that they are linealy dependent, which means that one or more vectors in the set are just linear combinations of other vectors in the set.

For example, the set of vectors {[1,0],[0,1],[2,3]} spans R2, but [2,3] is redundant because we can remove it from the set of vectors and still have a set that spans R2. In other words, it adds no new information or directionality to our set.

A set of vectors will be said to be linearly dependent when: c1v1++cnvn=0{cici0} Thus a set of vectors is linearly independent when the following is met: c1v1++cnvn=0{cici=0} Sure, this is a definition, but why does this mean that the set is linearly dependent?

Picture of linearly indpendent vectors

Think of vectors in a R2. For any two vectors that are not co-linear, there is no combination of those two vectors that can get you back to the origin unless they are both scaled by zero. The image to the left is trying to demonstrate that. If we scale a, no matter how small we make that scaling, we cannot find a scaling of b that when added to a will get us back to the origin. Thus we cannot find a linear combination cava+cbvb=0 where ca and/or cb does not equal zero.

This shows that neither vector is a linear combination of the other, because if it where, we could use a non-zero scaling of one, added to the other, to get us back to the origin.

We can also show that two vectors are linearly independent when their dot product is zero. This is covered in a latter section.

Another way of thinking about LI is that there is only one solution to the equation Ax=0.

Linear Subspace

If V is a set of vectors in Rn, then V is a subspace of Rn if and only if:

  1. V contains the zero vector: 0V
  2. V is closed under scalar multiplication: xV,αRαxV
  3. V is closed under addition: aV,bVa+bV

What these rules mean is that, using linear operations, you can't do anything to "break" out of the subspace using vectors from that subspace. Hence "closed".

Surprising, at least for me, was that the zero vector is actually a subspace of Rn because the set {0} obeys all of the above rules.

A very useful thing is that a span is always a valid subspace. Take, for example, V=span(v1,v2,v3). We can show that it is a valid subspace by checking that it objects the above 3 rules.

First, does it contain the zero vector? Recall that a span is the set of all linear combinations of the span vectors. So we know that the following linear combination is valid, and hence the span contains the zero vector. 0v1+0v2+0v3=0 Second, is it closed under scalar multiplication? Because the span is the set of all linear combinations, for any set of scalars {a1,a2,a3}... a1v1+a2v2+a3v3=x ... must also be in the subspace, by definition of a subspace!

Third, is it closed under vector addition? Well, yes, the same argument as made above would apply.

Interestingly, using the same method as above, we can see that even something trivial, like the span([a1,a2]), which is just a line, is a subspace of R2.

It can also be shown that any subspace basis will have the same number of elements. We say that the dimension of a subspace is the number of elements in a basis for the subspace. So, for example, if we had a basis for our subspace consisting of 13 vectors, the dimension of the subspace would be lucky 13.

Basis Of A Subspace

Recall, V=span(v1,,vn) V is a subspace of Rn (where each vector viRn). We also know that a span is the set of all possible linear combinations of these vectors.

If we choose a subset of V, S, where all the vectors in S are linearly independent, an we can write any vector in V as a linear combination of vectors in S, then we say that S is a basis for V.

S acts as a "minimal" set of vectors required to span V. I.e., if you added any other vector to S it is superfluous, because it would be a linear combination of the vectors already in S. You don't add any new information.

Put more formally: Let V be a subspace of Rn. A subset v1,v2,,vk is a basis for V iff the following are true:

  1. The vectors v1,v2,,vk span V, and
  2. They are linearly independent.

A subspace can have an infinite number of basis, but there are standard basis. For example in R2 the standard basis is {[1,0],[0,1]}. The rule of invariance of dimension states that two bases for a subspace contain the same number of vectors. The number of elements in the basis for a subspace is called the subspace's dimension, denoted as dim(W), where W is the subspace.

Thus, we can say that every subspace W of Rn has a basis and dim(W)n.

A basis is a good thing because we can break down any element in the space that the basis defines into a linear combination of basic building blocks, i.e. the basis vectors.

To find a basis use the following method:

  1. Form a matrix where the jth column vector is the jth vector wj.
  2. Reduce the matrix to row-echelon form.
  3. Tak the column vectors that contain a pivot and these are the basis vectors.

TODO discuss the canonical vectors in R2 and R3. [1,0], [0,1]... but we could also have [1,0], [1,1].

TODO unique coefficients. implied by the linear independence

TODO most important basis are orthogonal and orthonormal.

Orthonormal are good because we can find the coefficient in the new basis easily: αk=wkx˙

The Dot Product


This is a good reference.

The dot product is one definition for a vector multiplication that results in a scalar. Given two equal-length vectors, the dot product looks like this: [a1a2an][b1b2bn]=a1b1+a2b2++anbn Written more succinctly we have: xy=1nxiyi The dot product is important because it will keep popping up in many places. Some particular uses are to calculate the length of a vector and to show that two vectors are linearly independent.


The operation has the following properties:

Commutative: vw=wv
Distributive: (v+w)x=vx+wx
Associative (cv)w=c(vw)
Cauchy-Schwartz Inequality xy|xy|xy
|xy|=xyx=cy, where c is non-zero.
Vector Triangle Inequality x+yx+y

Angle Between Vectors

Dot product defines the angle between vectors

The dot product also defines the angle between vectors. Take two unit vectors a^ and b^. We can calculate: angle between a^ and b^=θ=θaθb We know that: cos(θa)=axandsin(θa)=ay cos(θb)=bxandsin(θb)=by And we know that: cos(θ)=cos(θaθb)=cos(θa)cos(θb)+sin(θa)sin(θb) Therefore, by substitution: cos(θ)=axbx+ayby=ab Thus, cos(θ)=ab describes the angle between unit vectors. If the vectors are not unit vectors then the angle is defined, in general, by: cos(θ)=1abab

Vector Length


Vector Transposes And The Dot Product

In linear algebra vectors are implicitly column vectors. I.e., when we talk about v we mean: v=[v1v2vn] Thus, the transpose of v is a row vector: vT=[v1v2vn] We can then do vector multiplication of the two like so: vTv=[v1v2vn][v1v2vn]=v1v1+v2v2++vnvn=vv So, in general vTw=vw=wTv.

Dot Product With Matrix Vector Product

If A is an mxn matrix and xRn then AxRm.

Then, if yRm then (Av)y is also well defined and: (Av)y=(Ax)Ty See section Vector Transposes And The Dot Product=xATy See matrix transpose rule  (AX)T=xTAT=xT(ATy)=x(ATy) Thus we can say (Ax)y=x(ATy).

Unit Length Vectors

A unit vector is a vector that has a length of 1. If u^ is a vector (notice the special symbol used) then we defined its length as: length(u^)=u^=u12+u22++un2 Thus if a vector is a unit vector then u^=1.

Interesting we can also write u2=uu˙.

To construct a unit vector where u12+u22++un2=1, we need the sum u12+u22++un2 to also be 1! How do we achieve this if u1? We do the following: u^=1uu , where u0

Orthogonal Vectors and Linear Independence

Image showing 3 perpendicular vectors
Figure 1

Orthogonal vectors are vectors that are perpendicular to each other. In a standard 2D or 3D graph, this means that they are at right angles to each other and we can visualise them as seen in Figure 1: x is at 90 degrees to both y and z. y is at 90 degrees to x and z, and z is at 90 degrees to y and z.

In any set of any vectors, [a1,...,an], the vectors will be linearly independent if every vector in the set is orthogonal to every other vector in the set. Note, however that whilst an orthogonal set is L.I., a L.I. set is not necessarily an orthogonal set. Take the following example: a=[11], b=[11] The vectors a and b are L.I. because nethier can be represented as a linear combination of the other. They are not however orthogonal.

So, how do we tell if one vector is orthogonal to another? The answer is the dot product which is defined as follows. xy=1nxiyi

We know when two vectors are orthogonal when their dot product is zero: xy=0 x and y are orthogonal. You can scale either or both vectors and their dot product will still be zero!

But why is this the case? Lets imagine any two arbitrary vectors, each on the circumference of a unit circle (so we know that they have the same length and are therefore a proper rotation of a vector around the center of the circle). This is shown in figure 2. From the figure, we know the following:

Picture showing a vector being rotated
Figure 2

x1=rcosθ1y1=rsinθ1x2=rcos(θ1+θn)y2=rsin(θ1+θn) The vector x2 is the vector x1 rotated by θn degrees. We can use the following trig identities: sin(a±b)=sin(a)cos(b)±cos(a)sin(b)cos(a±b)=cos(a)cos(b)sin(a)sin(b) Substitute these into the formulas above, we get the following. x2=rcosθ1cosθnrsinθ1sinθny2=rsinθ1cosθn+rcosθ1sinθn Which means that... x2=x1cosθny1sinθny2=y1cosθn+x1sinθn For a 90 degree rotation θn=90 we know that cosθn=0 and sinθn=1. Substiuting these values into the above equations we can clearly see that... x2=y1y2=x1 Therefore, any vector in 2D space that is [a,b] will become [b,a] and therefore the dot product becomes ab+ab=0. And voilà, we know know why the dot product of two orthogonal 2D vectors is zero. I'm happy to take it on faith that this extrapolates into n dimensions :)

Another way of looking at this is to consider the law of cosines: C2=A2+B22ABcos(c):

Law of cosines

Here A and V represent the magnitude of the vectors x, x, and y, y. C represents the magnitude of the vector xy. Thus, we have this:

Law of cosines

I.e., xy2=x2+y22xycos(θc).

When the angle between the vectors is 90 degrees, cos(90)=0 and so that part of the equation disappears to give us xy2=x2+y2. Now... xy2=x2+y2x2+y2xy2=0 Recall that v=v12+v22+...+vn2 and that therefore v2=v12+v22+...+vn2 .

Based, on this we can say that... xy2=(x1y1)2+(x2y2)2+...+(xnyn)2=x122x1y1+y12+x222x2y2+y22+...+xn22xnyn+yn2 Now, all of the xi2 and yi2 terms above will be cancelled out by the x2 and y2 terms, leaving us with... x2+y2xy2=2x1y1+2x2y2+...+2xnyn=0 Which is just twice the dot product when the vectors are at 90 degrees - or just the dot product because we can divide through by 2.



Khan Academy has great articles. This is another good reference from MIT.

Scalar Multiplication

  1. Associative: (cd)A=c(dA)
  2. Distributive: c(A+B)=cA+cB and (c+d)A=cA+dA
  3. Identity: 1A=A
  4. Zero: 0A=O
  5. Closure: cA is a matrix of the same dimensions as A.

Note that scalar multiplication is not commutative: cAAc!

Matrix Addition

  1. Commutative: A+B=B+A
  2. Associative: A+(B+C)=(A+B)+C
  3. Zero: For any matrix A, there is a unique matrix O such that A+O=A
  4. Inverse: A, !A  A+(A)=O
  5. Closure: A+B, is a matrix of the same dimensions as A and B.

Matrix Multiplication

  1. Associative: (AB)C=A(BC)
  2. Distributive: A(B+C)=AB+AC
  3. Identity: IA=A
  4. Zero: OA=O
  5. Inverse: Only if A is a square matrix, then A1 is its inverse and AA1=A1A=I. Note that not all matrices are invertible.
  6. Dimension: If A is an m×n matrix and B is an n×k matrix then AB is an m×k matrix.

Note that matrix multiplication is not commutative: ABBA!

Be careful: the equation AB=0 does not necessarily mean that A=0 or B=0 and for a general matrix A, AB=ACB=C, unless A is invertible.

Matrix Transpose

  1. Associative:(rA)T=rAT,rR
  2. Distributive (into addition): (A+B)T=AT+BT
  3. Distributive (into mult): (AB)T=BTAT Note the reversal of terms
    Generalises to (ABC)T=CTBTAT
  4. Identity: IT=I
  5. Inverse: (AT)T=A
  6. Name?: (AT)1=(A1)T
  7. Name?: (Ax)T=xTAT Note the reversal of terms
  8. Name?: rank(A)=rank(AT)
  9. If columns of A are LI then columns of ATA are LI. ATA is square and also invertible.

Matrix Inverse

  1. Associative (self with inverse): AA1=A1A=I
  2. Distributive (into scalar mult): (rA)1=r1A1, r0
  3. Distributive (into mult): (AB)1=B1A1 Note the reversal of terms
  4. Identity: I1=I
  5. Name?: (A1)1=A
  6. Name?: (AT)1=(A1)T

Matricies, Simultaneous Linear Equations And Reduced Row Echelon Form


The set of linear equations: a11x1+a12x2+...+a1nxn=b1 a21x1+a22x2+...+a2nxn=b2 ... am1x1+am2x2+...+amnxn=bm Can be represented in matrix form: [a11a12a1na21a22a2nam1am2amn][x1x2xn]=[b1b2bm] Any vector x, where Ax=b, is a solution to the set of simultaneous linear equations. How can we solve this from the matrix. We create an augmented or patitioned matrix: [1a11a12a1nb1a21a22a2nb2am1am2amnbm] Next this matrix must be converted to row echelon form. This is where each row has the same or less zeros to the left than the row below. For example, the following matrix is in row echelon form: [1234012300010002] But, this next one is not: [1234000100020123] We go from a matrix to the equivalent in row-echelon form using elementary row operations:

  1. Row interchange: RiRj
  2. Row scaling: RikRi
  3. Row addition: RiRi+kRj

In row echelon form:

  1. All rows containing only zeros appear below rows with nonzero entries, and
  2. The first nonzero entry in any row appears in a column to the right of the first nonzero entry in any preceding row.

The first non zero entry in a row is called the pivot for that row.

And woop, woop, even more important than echelon form is reduced row-echelon form (rref). This is when:

  1. If a row has all zeros then it is below all rows with a non-zero entry.
  2. The left most entry in each row is 1 - it is called the pivot.
  3. The left most non-zero entry in each row is the only non-zero in that column.
  4. Each row has more zeros to the left of the pivot than the row above.

Or, to put it another way, when it is a matrix in row-echelon form where:

  1. All pivots are equal to 1, and
  2. Each pivot is the only non-zero in its column.

Some important terms are pivot position and pivot column. The position is the location of a leading 1, which meets the above criteria, and the column is a column that contains a pivot position.

So why do we give a damn above RREF if we have row-echelon form? Well, one reason is it lets us figure out if a set of linear equations has a solution and which variables are free and which are dependent. The reason for this is that the RREF of every matrix is unique (whereas merely row reducing a matrix can have many resulting forms).

The free variable is one that you have control over, i.e. you can choose and manipulate its value. The dependent one depends on the values of the other variables in the experiment or is constant.

Pivot columns represent variables that are dependent. Why? Recall that the column vector containing a pivot is all-zeros except for the pivot. So if element j in a row of a rref matrix is a pivot, then in our system of equations xj only appears in this one equations in our RREF matrix. Thus, its value must be determined as this is the only place it occurs. It must either be a constant or a combination of other variables.

Because column vectors containing a pivot are all-zeros except for the pivot, the set of pivot columns must also be LI (see later). You can always represent the free columns as linear combinations of the pivot columns.

If we have an augmented matrix [A|b]:

  1. No Solution. If the last (augmented) column of [A|b] (ie. the column which contains b) contains a pivot entry.
  2. Restructed Solution Set. If the last row of the matrix is all zeros except the pivot column which is a function of b1bm.
  3. Exactly One Solution. If the last (augmented) column of [A|b] (ie. the column which contains b) does not contain a pivot entry and all of the columns of the coefficient part (ie. the columns of A) do contain a pivot entry.
  4. Infinitely Many Solutions. If the last (augmented) column of [A|b] (ie. the column which contains b) does not contain a pivot entry and at least one of the columns of the coefficient part (ie. the columns of A) also does not contain a pivot entry.

Let's look at a quick example of a matrix in echelon-form: [513303580024]

We can get the solution to the set of simultaneous linear equations using back substitution. Directly from the above row-echelon form (useful because the last row has one pivot and no free variables) matrix we know that: (1)5x1x23X3=3 (2)3x2+5x3=8 (3)2x3=4 From (1) we know: (4)x3=2 Substitute (4) back into (2): (5)3x2+10=8x2=6 Substitute (5) and (4) back into (1): (6)5x16+6=3x1=35

We can use RREF to accomplish the same thing and find the solutions to the above. [513303580024] [115353503580012](r1=r15,r3=r32) [1153535030180012](r2=r25r3) [115353501060012](r2=r23) [1003501060012](r1=r1+3r35,r1=r1r25) Which means we come to the same conclusion as we did when doing back substitution, but were able to do it using elementary row operations.

We can not talk about the general solution vector, x, which in this case is: [x1x2x3]=[3562] We can also talk about the solution set, which is the set of all vectors that result in a valid solution to our system of equations. In this case our solution set is trivial: it just has the above vector in it because the RREF had pivots in every column so there is exactly one solution.

Lets, therefore, look at something more interesting. Take the RREF matrix below: [10490018300000000000] Now we have two free variables! Now x3=a and x4=b where a and b can be any values we like. Now the general solution vector looks like this: [x1x2x3x4]=[4a9b8a3bab] How did we get here? We we know from the RREF matrix that: x1+4x3a+9x4b=0 x2+8x3a+3x4b=0 We replace x3 and x4 with our free variables and then solve for the pivots, thus getting the general solution vector.

The above can then be represented using the general solution vector: [4a9b8a3bab]=a[4810]+b[9b3b01] The solution set thus becomes: span([4810],[9301])

We can find the reduced row echelon form of a matrix in python using the sympy package and the rref() function:

from sympy import Matrix, init_printing


system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) ))

Which outputs:

⎡1  0  0  -17/3⎤
⎢              ⎥
⎢0  1  0  31/3 ⎥
⎢              ⎥
⎣0  0  1   17  ⎦

To then apply this to our augmented matrix we could do something like the following. Lets say we have a set of 3 simultaneous equations with 3 unknowns: 3x+6y2z=11x+2y+z=32xy+z=1 We'd make an augmented matrix like so: [36211121321111] To solve this using Python we do:

from sympy import Matrix, solve_linear_system, init_printing
from import x, y, z


system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) ))

solve_linear_system(system, x, y, z)

Which will give the following output:

⎡3  6   -2  11⎤
⎢             ⎥
⎢1  2   1   32⎥
⎢             ⎥
⎣1  -1  1   1 ⎦
{x: -17/3, y: 31/3, z: 17}

Matrix / Vector Multiplication

Matrix/vector multiplication is pretty straight forward: [acbd][24]=[2a+4c2b+4d]

The concept of matrix multiplication, the type I learnt in school, is shown in the little animation below. For each row vector of the LHS matrix we take the dot product of that vector with each column vector from the RHS matrix to produce the result:

But having said that, I'd never really thought of it as shown below until watching some stuff on Khan Academy:


We can think of it as the linear combination of the column vectors of the matrix where the coefficients are the members of the vector, where the first column vector is multiplied by the first element, the second column vector by the second element and so on.

So, for any mxn matrix, A, and vector vRn we can say that, Av=[c1c2cn][v1v2vn]=v1c1+v2c2++vncn

Null Space Of A Matrix


For an nxm vector A, the nullspace of the nxm matrix A, N(A) is defined as follows: N(A)={xRmAx=0} In words, the null space of a matrix is the set of all solutions that result in the zero vector, i.e. the solution set to the above. We can get this set by using the normal reduction to RREF to solve the above ([A0][R0]). The solution space set to this is the null space.

The system of linear equations Ax=0 is called homogeneous. It is always consistent, i.e., has a solution, because x=0 is always a solution (and is called the trivial solution).

The null space of a matrix is a valid subspace of Rn because it satisfies the following conditions, as we saw in the section on subspaces.

  1. N(A) contains the zero vector: 0N(A)
  2. N(A) is closed under scalar multiplication: xN(A),αRαxN(A)
  3. N(A) is closed under addition: aN(A),bN(A)a+bN(A)

It can also be shown that N(A)=N(RREF(A)), where RREF is the reduced row echelon form.

Relation To L.I. Of Column Vectors

The nullspace of a matrix can be related to the linear independence of it's column vectors. Using the above definition of nullspace, we can write: Av=[a1a2an][x1x2xn]=[01020m] Recalling how we can write out vector/scalar multipication, we get, x1a1+x2a2++xnan=0 Recall from the section on linear independence that the above set of vectors is said to be linearly dependent when: x1a1+x2a2++xnan=0, {xixi=0} Given that our nullspace is the set of all vectors that satisfy the above, by definition, then we can say that the column vectors of a matrix A are LI if and only if N(A)=0.

Relation To Determinant

If the determinant of a matrix is zero then the null space will be non trivial.

Calculating The Null Space

To find the null space of a matrix A, augment it with the zero vector and convert it to RREF. Then the solution set is the null space. So, if there are only pivots, the null space only contains the zero vector because there is only one solution (which therefore must be 0).

Example #1

Let's look at a trivial example: Let A=[41127.5261.53] Then the null space is defined as the solution set to Ax=0, i.e... [411830003][x1x2x3]=[000] We find the solution set using the normal augmented matrix method: [411083000030][411001100030][411001100010][410001000010][100001000010] There are no free variables, so therefore the solution set can contain only the zero vector because we have three contraints x1=0,x2=0 and x3=0. The only values for the xi's that can satisfy these constrains is zero, hence the null space can only contain the zero vector.

Example #2

Lets try an consider another trivial example. Let A=[23061208150] It's pretty contrived because I constructed it by applying row operations to the identify matrix! Null space is defined as solutions to: [23061208150][x1x2x3]=[000] We find the solution set using the normal augmented matrix method: [23006120081500][100001000000] Now we can see that x1=0 and x2=0. The variable x3, however, is free. It can take any value. Let's call that value x3=a. The solution vector is therefore: [x1x2x3]=[00a] The general solution vectors is therefore: a[001] So we can say the null space is: N(A)=span([001])

Example #3

Let's try one more example. This one is right out of Khan Achademy rather then my own made up rubbish :) Let A=[213426] We calculate the null space by setting up an augmented matrix to solve for Ax=0: [21304260][11/23/200000] We thus know that: x1=12x2+32x3 And that x2 and x3 are free variables. This means that the general solution is: [x1x2x3]=x2[1210]+x3[3201] And therefore we can define the null space as: N(A)=span([1210],[3201])

An Intuitive Meaning

This StackOverflow thread gives some really good intuitive meaning to the null space. I've quoted some of the answers verbatim here:

It's good to think of the matrix as a linear transformation; if you let h(v)=Av, then the null-space is ... the set of all vectors that are sent to the zero vector by h. Think of this as the set of vectors thatlose their identity as h is applied to them.

Let's suppose that the matrix A represents a physical system. As an example, let's assume our system is a rocket, and A is a matrix representing the directions we can go based on our thrusters. So what do the null space and the column space represent?

Well let's suppose we have a direction that we're interested in. Is it in our column space? If so, then we can move in that direction. The column space is the set of directions that we can achieve based on our thrusters. Let's suppose that we have three thrusters equally spaced around our rocket. If they're all perfectly functional then we can move in any direction. In this case our column space is the entire range. But what happens when a thruster breaks? Now we've only got two thrusters. Our linear system will have changed (the matrix A will be different), and our column space will be reduced.

What's the null space? The null space are the set of thruster intructions that completely waste fuel. They're the set of instructions where our thrusters will thrust, but the direction will not be changed at all.

Another example: Perhaps A can represent a rate of return on investments. The range are all the rates of return that are achievable. The null space are all the investments that can be made that wouldn't change the rate of return at all.

Nullity Of A Matrix

The dimension of a subspace is the number of elements in a basis for that subspace. The dimension of the null space is called the "nullity":

In the null space you will always have as many vectors as you have free variables in the RREF of the matrix. So the nullity is the number of free variables in the RREF of the matrix with which this null space is associated: nullity(A)=dim(N(A))=#non-pivot cols in RREF(A)=nrank(A) This the null space has dimension, or nullity, nrank(A). This equates to the number of free variables.

Column Space Of A Matrix

The columns space of a matrix is the set of all linear combinations of its column vectors, which is the span of the column vectors: C(A)=span(the column vectors of A)=span(the LI column vectors of A) Note that the column vectors of A might not be LI. The set can be reduced so that only the LI vectors are used, which means the above still holds, and then we say that the LI set is a basis for C(A). The column space of A is a valid subspace.

For the system of equations Ax=b, C(A) is the set of all values that Ax can be. We can look at this this way: Ax=[abcd][xy]=[ax+bycx+dy]=x[ac]+y[bd]=b The solution, b, can be seen to be a linear combination of the column vectors of A. Thus, bC(A).

Therefore, if Ax=b1 and b1C(A), then we know that no solution exists, a conversely, if it is then we know that at least one solution exists.

The set of vectors in the column space of may or may not be linearly independent. When reduced to being LI then we have a basis for the column space of the matrix.

We saw the following in the section of null spaces: It can be shown that the column vectors of a matrix A are LI if and only if N(A)=0. So we can figure out if we have a basis by looking at A's null space. Recall that N(A)=N(RREF(A)) and that the column vectors of a matrix A are LI if and only if N(A)=0!

How do we figure out the set of column vectors that are LI? To do this, put the matrix in RREF and select the columns that have pivot entries (the pivot columns). The pivot column vectors are the basis vectors for C(A) :) The reason is that you can always write the columns with the free variables as linear combinations of the pivot columns.

Because column vectors containing a pivot are all-zeros except for the pivot, the set of pivot columns must also be LI. You can always represent the free columns as linear combinations of the pivot columns.

So to find column space of A, pick the columns from A that correspond to (have the same position as) the pivot columns in RREF(A).

A really interested property of the column space is the column space criterion which states that a linear system, Ax=b has a solution iff b is in the column space of A.

Rank Of A Matrix

We can now define what the rank of a matrix is: rank(A)=dim(Cbasis(A))=#pivot columns in RREF(A) I.e., The rank of a matrix is the number of pivots. This also means that the rank is the number of vectors in a basis for the column space of A. You can get this by counting the pivot columns in RREF(A).

The rank equation is rank(A)+nullity(A)=n, where A is an mxn matrix.

If A is reduced to row-echelon form H, then:

  1. nullity(A) is the number of free variables in solution space of Ax=0, which equals the number of pivot-free columns in H.
  2. rank(A) is the number of pivots in H.
  3. rank(A)+nullity(A)=n, equals the number of columns of A.

Basis Of The Column Space VS Basis Of The Null Space

Really good answer by John-Paul Meyer on the Khan Academy website:

... a "basis of a column space" is different than a "basis of the null space", for the same matrix....

A basis is a a set of vectors related to a particular mathematical "space" (specifically, to what is known as a vector space). A basis must:

  1. be linearly independent and
  2. span the space.

The "space" could be, for example: R2, R3, R4, ... , the column space of a matrix, the null space of a matrix, a subspace of any of these, and so on...

Orthogonal Matrices

So, onto orthogonal matrices. A matrix is orthogonal if AAT=ATA=I. If we take a general matrix and multiply it by it's transpose we get...


The pattern ab+cd looks pretty familiar right?! It looks a lot like x1x2+y1y2, our formula for the dot product of two vectors. So, when we say a=x1, b=x2, c=y1 and d=y2, we will get the following: a matrix of two row vectors v1=[x1,y1], v2=[x2,y2]. A=(v1v2)=(x1y1x2y2) AAT=(x1y1x2y2)(x1x2y1y2)=(x12+y12x1x2+y1y2x1x2+y1y2x22+y22) If our vectors v1 and v2 are orthogonal, then the component x1x2+y1y2 must be zero. This would give us the first part of the identify matrix pattern we're looking for.

The other part of the identity matrix would imply that we would have to have x12+y12=1 and x22+y22=1... which are just the formulas for the square of the length of a vector. Therefore, if our two vectors are normal (i.e, have a length of 1), we have our identity matrix.

Does the same hold true for ATA? It doesn't if we use our original matrix A!... ATA=(x1x2y1y2)(x1y1x2y2)=(x12+x22x1y1+x2y2x1y1+x2y2y12+y22) Oops, we can see that we didn't get the identity matrix!! But, perhaps we can see why. If A was a matrix of row vectors then AT is a matrix of column vectors. So for AAT we were multiplying a matrix of row vectors with a matrix of column vectors, which would, in part, give us the dot products as we saw. So if we want to do ATA it would follow that for this to work A now was to be a matrix of column vectors because we get back to our original (AT would become a mtrix of row vectors): ATA=(x1y1x2y2)(x1x2y1y2)=(x12+y12x1x2+y1y2x1x2+y1y2x22+y22)

So we can say that if we have matrix who's rows are orthogonal vectors and who's columns are also orthogonal vectors, then we have an orthogonal matrix!

Okay, thats great n' all, but why should we care? Why Are orthogonal matricies useful?. It turns our that orthogonal matricies preserve angles and lengths of vectors. This can be useful in graphics to rotate vectors but keep the shape they construct, or in numerical analysis because they do not amplify errors.


The Fundamentals

If B is defined as follows... B=[abcd] Then we write the determinate of B as... det(B)=|B|=|abcd|=adbc

A matrix is invertible iff |B|0. We can define the inverse as follows using the determinant: B1=1|B|[dbca] It is defined as long as |B|0.

If A is a 3x3 matrix defined as follows, where the elements aij represents the element at row i and column j: A=[a11a12a13a21a22a23a31a32a33] We can define the determinant as: det(A)=a11|a22a23a32a33|a12|a21a23a31a33|+a13|a21a22a31a32| Same rule applies: If the determinant is not zero then the matrix will be invertible.

We've used the first row (a11,a12,a13) but we could have used any row, or any column: choose the one with the most zeros as this will make the calculation easier/faster.

Select the row or column with the most zeros and go through it. Use each element aij multiplied by the determinant of the 2x2 matrix you get if you take out the ith row and jth column of the matrix.

This generalises to an nxn matrix as follows: nxnA =[a11a12a1na21a22a2nan1an2ann] Define Aij to be the (n-1) x (n-1) matrix you get when you ignore the ith and jth column of the nxn matrix A.

Then we can define the determinant recursively as follows: det(A)=a11|A11|a12|A12|+a13|A13|+a1n|A1n| For each |Aij|, apply the formula recursively until a 2x2 matrix is reached, at which point you can use the 2x2 determinant formula.

There is a pattern to the "+"'s and "-"'s: sign(i,j)=1(i+j)

A quick example. A is defined as follows: A=[1234102001232300] We find the determinant as follows, where as we did previously, we define Aij to be the (n-1) x (n-1) matrix you get when you ignore the ith and jth column of the nxn matrix A. det(A)=2|A41|+3|A42| See how the 4th row was used as it contains 2 zeros, meaning that we only have to actually do any work for two of the terms. Note also how we got the signes for the various components using sign(i,j)=1(i+j). Another way to visualise this is shown below: [++++++++] This now needs to be applied recusively, so we figure out the determinants of A41 and A42 in the same way.

Let B=A41: |B|=|234020123|=2|B22|=2|2413|=2(2341)=4 Note again how we picked row 2 because it only had 1 term which was non-zero and hence made our life easier.

Let C=A42 |C|=|134120023|=1|C21|+2|C22|=1|3423|+2|1403|=1(98)+2(30)=5 We picked row 2 again, but we could have just as easily used columns 1 or 3 to the same effect.

Now we can write: det(A)=2|A41|+3|A42|=2|B|+3|C|=2(4)+3(5)=7 We can see this answer verified in the section "Do It In Python"

Do It In Python

>>>import numpy as np
>>> A = np.mat("1 2 3 4;1 0 2 0; 0 1 2 3; 2 3 0 0")
>>> np.linalg.det(A)

See It In HTML

The following is an online matrix determinant calculator app, written in Javascript, to explain calculation of determinant. Define a matrix in the text box using a Matlab-like sytax. For example, "1 2 3; 3 4 5; 5 6 7"