# Linear Algebra Notes

Notes on some linear algebra topics. Sometimes I find things are just stated
as being obvious, for example, the dot product of two orthogonal vectors
is zero

. Well... why is this? The result was a pretty simple but
nonetheless it took a little while to think it through properly.
Hence, these notes... they started of as my musings on the "why", even if the
"why" might be obvious to the rest of the world!

A lot of these notes are now just based on the amazing Khan Academy linear algebra tutorials. In fact, id say the vast majority are... they are literally just notes on Sal's lectures!

Another resource that is really worth watching is 3 Blue 1 Brown's series "The essence of linear algebra".

## Page Contents

## Vector Stuff

### Vector Spaces

References:

- What isn't a vector space?, Mathematics Stack Exchange.
- Things that are/aren't vector spaces, Dept. of Mathematics and Statistics,University of New Mexico.
- Vector Spaces, physics.miami.edu.
- Vector space, Wikipedia.

A vector is an element of a *vector space*. A *vector space*, also known as a
*linear space*,

1 | Closed under addition: | |

2 | Communative: | |

3 | Associative: | |

4 | Additive identity: | |

5 | Inverse: | |

6 | Closed under scalar multiply: | |

7 | Distributive: | |

8 | Distributive: | |

9 | Distributive: | |

10 | Multiply identity: |

#### Functions Spaces

First let's properly define a function. If *ordered* pairs *returned* by
the function: *domain*)
to the elements in *range*).

One example of a vector space is the set of real-valued functions of a real variable, defined
on the domain

We've said that a vector is any element in a vector space, but usually we thing of this in
terms of, say,

#### What Isn't A Vector Space

It sounded a lot like everything was a vector space to me! So I did a quick bit of googling and found the paper "Things that are/aren't vector spaces". It does a really good example of showing how in maths we might build up a model of something and expect it to behave "normally" (i.e., obeys the vector space axioms), but in fact doesn't, in which case it becomes a lot harder to reason about them. This is why vectors spaces are nice... we can reason about them using logic and operations we are very familiar with and feel intuitive.

### Linear Combinations

A linear combination of vectors is one that only uses the addition and scalar multiplication
of those variables. So, given two vectors,

### Spans

The *set of all linear combinations* of a set of vectors *span* of those vectors:
**generating set** of vectors for a subspace.

In the graph to the right there are two vectors

From this we may be tempted to say that any set of two, 2d, vectors spans *not* span

Lets write the above out a little more thoroughly...

### Linear Independence

We saw above that the set of vectors *dependent*, which
means that *one or more vectors in the set are just linear combinations of other vectors
in the set*.

For example, the set of vectors

A set of vectors will be said to be linearly *dependent* when:
**independent** when the following is met:
*dependent*?

Think of vectors in a

This shows that neither vector is a linear combination of the other, because if it where, we could use a non-zero scaling of one, added to the other, to get us back to the origin.

We can also show that two vectors are linearly independent when their dot product is zero. This is covered in a latter section.

Another way of thinking about LI is that there is only one solution to
the equation

### Linear Subspace

If *subspace* of

: contains the zero vector$V$ $\overrightarrow{0}\in V$ : is closed under scalar multiplication$V$ $\overrightarrow{x}\in V,\alpha \in \mathbb{R}{\textstyle \phantom{\rule{0.278em}{0ex}}}\u27f9{\textstyle \phantom{\rule{0.278em}{0ex}}}\alpha \overrightarrow{x}\in V$ : is closed under addition$V$ $\overrightarrow{a}\in V,\overrightarrow{b}\in V{\textstyle \phantom{\rule{0.278em}{0ex}}}\u27f9{\textstyle \phantom{\rule{0.278em}{0ex}}}\overrightarrow{a}+\overrightarrow{b}\in V$

What these rules mean is that, using linear operations, you can't do anything to "break" out of the subspace using vectors from that subspace. Hence "closed".

Surprising, at least for me, was that the zero vector is actually a subspace of

A very useful thing is that a *span is always a valid subspace*. Take, for example,

First, does it contain the zero vector? Recall that a span is the set of all linear combinations
of the span vectors. So we know that the following linear combination is valid, and hence the
span contains the zero vector.
*all*
linear combinations, for any set of scalars

Third, is it closed under vector addition? Well, yes, the same argument as made above would apply.

Interestingly, using the same method as above, we can see that even something trivial,
like the

It can also be shown that *any subspace basis will have the same number of elements*. We say that the **dimension of a subspace** is the number of
elements in a basis for the subspace. So, for example, if we had a
basis for our subspace consisting of 13 vectors, the dimension of the subspace would be lucky 13.

### Basis Of A Subspace

Recall,

If we choose a subset of *independent*,
an we can write any vector in *basis* for

Put more formally: Let

- The vectors
span${\overrightarrow{v}}_{1},{\overrightarrow{v}}_{2},\dots ,{\overrightarrow{v}}_{k}$ , and$V$ - They are linearly
*in*dependent.

A subspace can have an infinite number of basis, but there are *standard basis*. For
example in **invariance of dimension** states that two bases for a subspace contain the same number of vectors. The number of elements in the basis for a subspace is called the subspace's **dimension**, denoted as

Thus, we can say that every subspace

A basis is a good thing because we can break down any element in the space that the basis defines into a linear combination of basic building blocks, i.e. the basis vectors.

To **find a basis** use the following method:

- Form a matrix where the j
^{th}column vector is the j^{th}vector .${w}_{j}$ - Reduce the matrix to row-echelon form.
- Tak the column vectors that contain a pivot and these are the basis vectors.

TODO discuss the canonical vectors in R2 and R3.

TODO unique coefficients. implied by the linear independence

TODO most important basis are orthogonal and orthonormal.

Orthonormal are good because we can find the coefficient in the new basis easily:

### The Dot Product

#### Definition

This is a good reference.

The dot product is one definition for a vector multiplication that results in a scalar.
Given two equal-length vectors, the dot product looks like this:

#### Properties

The operation has the following properties:

Commutative: | |

Distributive: | |

Associative | |

Cauchy-Schwartz Inequality | |

Vector Triangle Inequality |

#### Angle Between Vectors

The dot product also defines *the angle between vectors*. Take two unit vectors . If the vectors are not unit
vectors then the angle is defined, in general, by:

#### Vector Length

### Vector Transposes And The Dot Product

In linear algebra vectors are implicitly column vectors. I.e., when we talk about

### Dot Product With Matrix Vector Product

If

Then, if .

### Unit Length Vectors

A unit vector is a vector that has a length of 1. If

Interesting we can also write

To construct a unit vector where

### Orthogonal Vectors and Linear Independence

*Figure 1*

**Orthogonal vectors** are vectors that are perpendicular to each other.
In a standard 2D or 3D graph, this means that
they are at right angles to each other and we can visualise them
as seen in Figure 1:

In any set of any vectors, **linearly independent** if every vector in the set
is orthogonal to every other vector in the set. Note, however that whilst an orthogonal set is L.I., a L.I. set is not necessarily
an orthogonal set. Take the following example:

So, how do we tell if one vector is orthogonal to another? The answer is
the **dot product** which is defined as follows.

We know when two **vectors are orthogonal when their dot product is
zero: **. You can scale either or both vectors and their dot product will still be zero!

But why is this the case? Lets imagine any two arbitrary vectors, each on the circumference of a unit circle (so we know that they have the same length and are therefore a proper rotation of a vector around the center of the circle). This is shown in figure 2. From the figure, we know the following:

Figure 2

**why** the dot product of two orthogonal 2D vectors is
zero. I'm happy to take it on faith that this extrapolates into n dimensions :)

Another way of looking at this is to consider the law of cosines:

Here

I.e.,

When the angle between the vectors is 90 degrees,

Based, on this we can say that...

## Matrices

### Properties

Khan Academy has great articles. This is another good reference from MIT.

#### Scalar Multiplication

- Associative:
$(cd)A=c(dA)$ - Distributive:
and$c(A+B)=cA+cB$ $(c+d)A=cA+dA$ - Identity:
$1A=A$ - Zero:
$0A=O$ - Closure:
is a matrix of the same dimensions as$cA$ .$A$

Note that scalar multiplication is *not* commutative:

#### Matrix Addition

- Commutative:
$A+B=B+A$ - Associative:
$A+(B+C)=(A+B)+C$ - Zero: For any matrix
, there is a unique matrix$A$ such that$O$ $A+O=A$ - Inverse:
$\mathrm{\forall}A,\text{}\mathrm{\exists}!-A\text{}\mid \text{}A+(-A)=O$ - Closure:
, is a matrix of the same dimensions as$A+B$ and$A$ .$B$

#### Matrix Multiplication

- Associative:
$(AB)C=A(BC)$ - Distributive:
$A(B+C)=AB+AC$ - Identity:
$IA=A$ - Zero:
$OA=O$ - Inverse: Only if
is a$A$ *square*matrix, then is its inverse and${A}^{-1}$ . Note that$A{A}^{-1}={A}^{-1}A=I$ *not all matrices are invertible*. - Dimension: If
is an$A$ matrix and$m\times n$ is an$B$ matrix then$n\times k$ is an$AB$ matrix.$m\times k$

Note that matrix multiplication is *not* commutative:

Be careful: the equation

#### Matrix Transpose

- Associative:
$(rA{)}^{T}=r{A}^{T},r\in \mathbb{R}$ - Distributive (into addition):
$(A+B{)}^{T}={A}^{T}+{B}^{T}$ - Distributive (into mult):
$(AB{)}^{T}={B}^{T}{A}^{T}\text{}{\u25c2\text{Note the reversal of terms}}$

Generalises to$(ABC{)}^{T}={C}^{T}{B}^{T}{A}^{T}$ - Identity:
${I}^{T}=I$ - Inverse:
$({A}^{T}{)}^{T}=A$ - Name?:
$({A}^{T}{)}^{-1}=({A}^{-1}{)}^{T}$ - Name?:
$(A\overrightarrow{x}{)}^{T}={\overrightarrow{x}}^{T}{A}^{T}\text{}{\u25c2\text{Note the reversal of terms}}$ - Name?:
$\mathrm{rank}(A)=\mathrm{rank}({A}^{T})$ - If columns of
are LI then columns of$A$ are LI.${A}^{T}A$ is square and also invertible.${A}^{T}A$

#### Matrix Inverse

- Associative (self with inverse):
$A{A}^{-1}={A}^{-1}A=I$ - Distributive (into scalar mult):
$(rA{)}^{-1}={r}^{-1}{A}^{-1},\text{}r\ne 0$ - Distributive (into mult):
$(AB{)}^{-1}={B}^{-1}{A}^{-1}\text{}{\u25c2\text{Note the reversal of terms}}$ - Identity:
${I}^{-1}=I$ - Name?:
$({A}^{-1}{)}^{-1}=A$ - Name?:
$({A}^{T}{)}^{-1}=({A}^{-1}{)}^{T}$

### Matricies, Simultaneous Linear Equations And Reduced Row Echelon Form

References:

- Lecture 18: The Rank of a Matrix and Consistency of Linear Systems, Ohio University.
- Types of Solution Sets.
- Row Echelon Form and Number of Solutions.

The set of linear equations:
**augmented or patitioned matrix**:
**row echelon form**. This is where each row has
the same or less zeros to the left than the row below. For example, the following matrix is
in row echelon form:
**elementary row operations**:

- Row interchange:
${R}_{i}\leftrightarrow {R}_{j}$ - Row scaling:
${R}_{i}\to k{R}_{i}$ - Row addition:
${R}_{i}\to {R}_{i}+k{R}_{j}$

In row echelon form:

- All rows containing only zeros appear below rows with nonzero entries, and
- The first nonzero entry in any row appears in a column to the right of the first nonzero entry in any preceding row.

The first non zero entry in a row is called the *pivot* for that row.

And woop, woop, even more important than echelon form is **reduced row-echelon form (rref)**. This is when:

- If a row has all zeros then it is below all rows with a non-zero entry.
- The left most entry in each row is 1 - it is called the
*pivot*. - The left most non-zero entry in each row is the only non-zero in that column.
- Each row has more zeros to the left of the pivot than the row above.

Or, to put it another way, when it is a matrix in row-echelon form where:

- All pivots are equal to 1, and
- Each pivot is the only non-zero in its column.

Some important terms are *pivot position* and *pivot column*. The position is the location of a leading 1, which meets the above criteria, and the column is a column that contains a pivot position.

So why do we give a damn above RREF if we have row-echelon form? Well, one reason is it lets us
figure out if a set of linear equations has a solution and which variables are *free* and
which are *dependent*. The reason for this is that the RREF of every matrix is *unique* (whereas merely row reducing a matrix can have many resulting forms).

The *free* variable is one that you have control over, i.e. you can choose and manipulate its value.
The *dependent* one depends on the values of the other variables in the experiment or is constant.

*Pivot columns represent variables that are dependent*. Why?
Recall that the column vector containing a pivot is all-zeros except
for the pivot. So if element *j* in a row of a rref matrix is a pivot, then in our
system of equations

Because column vectors containing a pivot are all-zeros except
for the pivot, the **set of pivot columns must also be LI** (see later).
You can always represent the free columns as linear combinations of the
pivot columns.

If we have an augmented matrix

*No Solution*. If the last (augmented) column of (ie. the column which contains$[A|b]$ ) contains a pivot entry.$b$ *Restructed Solution Set*. If the last row of the matrix is all zeros except the pivot column which is a function of .${b}_{1}\dots {b}_{m}$ *Exactly One Solution*. If the last (augmented) column of (ie. the column which contains$[A|b]$ ) does not contain a pivot entry and all of the columns of the coefficient part (ie. the columns of$b$ ) do contain a pivot entry.$A$ *Infinitely Many Solutions*. If the last (augmented) column of (ie. the column which contains$[A|b]$ ) does not contain a pivot entry and at least one of the columns of the coefficient part (ie. the columns of$b$ ) also does not contain a pivot entry.$A$

Let's look at a quick example of a matrix in echelon-form:

We can get the solution to the set of simultaneous linear equations using back substitution. Directly
from the above row-echelon form (useful because the last row has one pivot and no free variables) matrix we know that:

We can use RREF to accomplish the same thing and find the solutions to the above.

We can not talk about the **general solution vector**, **solution set**, which is the set of all vectors that result in a valid solution to our system of equations. In this case our solution set is trivial: it just has the above vector in it because the RREF had pivots in every column so there is exactly one solution.

Lets, therefore, look at something more interesting. Take the RREF matrix below:

The above can then be represented using the **general solution vector**:
**solution set** thus becomes:

We can find the reduced row echelon form of a matrix in python using the `sympy`

package and the `rref()`

function:

from sympy import Matrix, init_printing init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system system.rref()[0]

Which outputs:

⎡1 0 0 -17/3⎤ ⎢ ⎥ ⎢0 1 0 31/3 ⎥ ⎢ ⎥ ⎣0 0 1 17 ⎦

To then apply this to our augmented matrix we could do something like the following. Lets
say we have a set of 3 simultaneous equations with 3 unknowns:

from sympy import Matrix, solve_linear_system, init_printing from sympy.abc import x, y, z init_printing(use_unicode=True) system = Matrix(( (3, 6, -2, 11), (1, 2, 1, 32), (1, -1, 1, 1) )) system solve_linear_system(system, x, y, z)

Which will give the following output:

⎡3 6 -2 11⎤ ⎢ ⎥ ⎢1 2 1 32⎥ ⎢ ⎥ ⎣1 -1 1 1 ⎦ {x: -17/3, y: 31/3, z: 17}

### Matrix / Vector Multiplication

Matrix/vector multiplication is pretty straight forward:

The concept of matrix multiplication, the type I learnt in school, is
shown in the little animation below. For each *row* vector of the LHS matrix
we take the dot product
of that vector with each *column* vector from the RHS matrix to produce the result:

But having said that, I'd never really thought of it as shown below until watching some stuff on Khan Academy:

We can think of it as the linear combination of the column vectors of the matrix where the coefficients are the members of the vector, where the first column vector is multiplied by the first element, the second column vector by the second element and so on.

So, for any mxn matrix,

### Null Space Of A Matrix

#### Definition

For an nxm vector *null space of a matrix is the set of all solutions
that result in the zero vector*, i.e. the solution set to the above. We can get this set by using the normal reduction to RREF to solve the above (

The system of linear equations **homogeneous**. It is always
**consistent**, i.e., has a solution, because **trivial** solution).

*The null space of a matrix is a valid subspace of * because
it satisfies the following conditions, as we saw in the
section on subspaces.

: contains the zero vector$N(A)$ $\overrightarrow{0}\in N(A)$ : is closed under scalar multiplication$N(A)$ $\overrightarrow{x}\in N(A),\alpha \in \mathbb{R}{\textstyle \phantom{\rule{0.278em}{0ex}}}\u27f9{\textstyle \phantom{\rule{0.278em}{0ex}}}\alpha \overrightarrow{x}\in N(A)$ : is closed under addition$N(A)$ $\overrightarrow{a}\in N(A),\overrightarrow{b}\in N(A){\textstyle \phantom{\rule{0.278em}{0ex}}}\u27f9{\textstyle \phantom{\rule{0.278em}{0ex}}}\overrightarrow{a}+\overrightarrow{b}\in N(A)$

It can also be shown that

#### Relation To L.I. Of Column Vectors

The nullspace of a matrix can be related to the linear independence of it's column
vectors. Using the above definition of nullspace, we can write:
*dependent* when:
*by definition*,
then we can say that **the column vectors of a matrix A are LI if and only if **.

#### Relation To Determinant

If the determinant of a matrix is zero then the null space will be non trivial.

#### Calculating The Null Space

To *find* the null space of a matrix

##### Example #1

Let's look at a trivial example:

##### Example #2

Lets try an consider another trivial example.

##### Example #3

Let's try one more example. This one is right out of Khan Achademy rather then my own made up rubbish :)

#### An Intuitive Meaning

This StackOverflow thread gives some really good intuitive meaning to the null space. I've quoted some of the answers verbatim here:

It's good to think of the matrix as a linear transformation; if you let

, then the $h(\overrightarrow{v})=A\cdot \overrightarrow{v}$ null-space is ... the set of all vectors that are sent to the zero vectorby. Think of this as the $h$ set of vectors thatlose their identityasis applied to them. $h$

Let's suppose that the matrix A represents a physical system. As an example, let's assume our system is a rocket, and A is a matrix representing the directions we can go based on our thrusters. So what do the null space and the column space represent?

Well let's suppose we have a direction that we're interested in. Is it in our column space? If so, then we can move in that direction. The column space is the set of directions that we can achieve based on our thrusters. Let's suppose that we have three thrusters equally spaced around our rocket. If they're all perfectly functional then we can move in any direction. In this case our column space is the entire range. But what happens when a thruster breaks? Now we've only got two thrusters. Our linear system will have changed (the matrix A will be different), and our column space will be reduced.

What's the null space? The null space are the set of thruster intructions that completely waste fuel. They're the set of instructions where our thrusters will thrust, but the direction will not be changed at all.

Another example: Perhaps A can represent a rate of return on investments. The range are all the rates of return that are achievable. The null space are all the investments that can be made that wouldn't change the rate of return at all.

#### Nullity Of A Matrix

The dimension of a subspace is the number of elements in a basis for that subspace. The dimension of
the null space is called the **"nullity"**:

In the null space you will always have *as many vectors as you have free variables in the RREF of the matrix*. So the *nullity is the number of free variables in the RREF of the matrix with which this null space is associated*:
*nullity*,

### Column Space Of A Matrix

The columns space of a matrix is the set of all linear combinations
of its column vectors, which is the span of the column vectors:
*basis for *.
The column space of A is a valid subspace.

For the system of equations

Therefore, if

The set of vectors in the column space of may or may not be linearly independent. When reduced to being LI then we have a *basis for the
column space of the matrix*.

We saw the following in the section of null spaces: It can be shown that the column vectors of a matrix

How do we figure out the set of column vectors that are LI? To do this,
put the matrix in RREF and select the columns that have pivot entries (the pivot columns).
*The pivot column vectors are the basis vectors for * :) The reason is that you can always write the columns with the free variables
as linear combinations of the pivot columns.

Because column vectors containing a pivot are all-zeros except
for the pivot, the **set of pivot columns must also be LI**.
You can always represent the free columns as linear combinations of the
pivot columns.

So to find column space of

A really interested property of the column space is the **column space criterion** which states that a linear system,

### Rank Of A Matrix

We can now define what the rank of a matrix is:

The **rank equation** is

If

is the number of free variables in solution space of$\mathrm{nullity}(A)$ , which equals the number of pivot-free columns in$A\overrightarrow{x}=\overrightarrow{0}$ .$H$ is the number of pivots in$\mathrm{rank}(A)$ .$H$ , equals the number of columns of$\mathrm{rank}(A)+\mathrm{nullity}(A)=n$ .$A$

TODO: https://math.stackexchange.com/questions/21100/importance-of-matrix-rank

### Basis Of The Column Space VS Basis Of The Null Space

Really good answer by John-Paul Meyer on the Khan Academy website:

... a "basis of a column space" is different than a "basis of the null space", for the same matrix....

A basis is a a set of vectors related to a particular mathematical "space" (specifically, to what is known as a vector space). A basis must:

- be linearly independent and
- span the space.
The "space" could be, for example:

, ${\mathbb{R}}^{2}$ , ${\mathbb{R}}^{3}$ , ... , the column space of a matrix, the null space of a matrix, a subspace of any of these, and so on... ${\mathbb{R}}^{4}$

### Orthogonal Matrices

So, onto orthogonal matrices. A **matrix is orthogonal if **.
If we take a general matrix and multiply it by it's transpose we get...

The pattern *row vectors*

The other part of the identity matrix would imply that we would have
to have

Does the same hold true for

So we can say that if we have **matrix who's rows are orthogonal vectors
and who's columns are also orthogonal vectors, then we have
an orthogonal matrix**!

Okay, thats great n' all, but **why should we care? Why Are orthogonal
matricies useful?**. It turns our that orthogonal matricies
preserve
angles and lengths of vectors. This can be useful in graphics to rotate
vectors but keep the shape they construct, or in numerical analysis
because they do not amplify errors.

### Determinants

#### The Fundamentals

If

A **matrix is invertible iff **. We can define the inverse as follows using the determinant:

If

We've used the first row (

Select the row or column with the most zeros and go through it. Use each element ^{th} row and j^{th} column of
the matrix.

This generalises to an nxn matrix as follows:
^{th} and j^{th} column of the nxn matrix A.

Then we can define the determinant *recursively* as follows:

There is a pattern to the "+"'s and "-"'s:

A quick example. ^{th} and j^{th} column of the nxn matrix A.
^{th} row was used as it contains 2 zeros, meaning that we only have to actually do
any work for two of the terms. Note also how we got the signes for the various components using

Let

Let

Now we can write:

#### Do It In Python

>>>import numpy as np >>> A = np.mat("1 2 3 4;1 0 2 0; 0 1 2 3; 2 3 0 0") >>> np.linalg.det(A) 7.000000000000004

#### See It In HTML

The following is an online matrix determinant calculator app, written in Javascript, to explain calculation of determinant. Define a matrix in the text box
using a Matlab-like sytax. For example, "`1 2 3; 3 4 5; 5 6 7`

"