import numpy as np
The LAB
Chapter 1. Vectors, scalars, and geometry
1. Scalars, Vectors, and Coordinate Systems
Let’s get our hands dirty! This lab is about playing with the building blocks of linear algebra: scalars and vectors. Think of a scalar as just a plain number, like 3
or -1.5
. A vector is a small list of numbers, which you can picture as an arrow in space.
We’ll use Python (with NumPy) to explore them. Don’t worry if this is your first time with NumPy - we’ll go slowly.
Set Up Your Lab
That’s it - we’re ready! NumPy is the main tool we’ll use for linear algebra.
Step-by-Step Code Walkthrough
Scalars are just numbers.
= 5 # a scalar
a = -2.5 # another scalar
b
print(a + b) # add them
print(a * b) # multiply them
2.5
-12.5
Vectors are lists of numbers.
= np.array([2, 3]) # a vector in 2D
v = np.array([1, -1, 4]) # a vector in 3D
w
print(v)
print(w)
[2 3]
[ 1 -1 4]
Coordinates tell us where we are. Think of [2, 3]
as “go 2 steps in the x-direction, 3 steps in the y-direction.”
We can even draw it:
import matplotlib.pyplot as plt
# plot vector v
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(0, 4)
plt.xlim(0, 4)
plt.ylim(
plt.grid() plt.show()
This makes a little arrow from the origin (0,0)
to (2,3)
.
Try It Yourself
- Change the vector
v
to[4, 1]
. Where does the arrow point now? - Try making a 3D vector with 4 numbers, like
[1, 2, 3, 4]
. What happens? - Replace
np.array([2,3])
withnp.array([0,0])
. What does the arrow look like?
2. Vector Notation, Components, and Arrows
In this lab, we’ll practice reading, writing, and visualizing vectors in different ways. A vector can look simple at first - just a list of numbers - but how we write it and how we interpret it really matters. This is where notation and components come into play.
A vector has:
- A symbol (we might call it
v
,w
, or even→AB
in geometry). - Components (the individual numbers, like
2
and3
in[2, 3]
). - An arrow picture (a geometric way to see the vector as a directed line segment).
Let’s see all three in action with Python.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Writing vectors in Python
# Two-dimensional vector
= np.array([2, 3])
v
# Three-dimensional vector
= np.array([1, -1, 4])
w
print("v =", v)
print("w =", w)
v = [2 3]
w = [ 1 -1 4]
Here v
has components (2, 3)
and w
has components (1, -1, 4)
.
- Accessing components Each number in the vector is a component. We can pick them out using indexing.
print("First component of v:", v[0])
print("Second component of v:", v[1])
First component of v: 2
Second component of v: 3
Notice: in Python, indices start at 0
, so v[0]
is the first component.
- Visualizing vectors as arrows In 2D, it’s easy to draw a vector from the origin
(0,0)
to its endpoint(x,y)
.
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(-1, 4)
plt.xlim(-2, 4)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
This shows vector v as a red arrow from (0,0)
to (2,3)
.
- Drawing multiple vectors We can plot several arrows at once to compare them.
= np.array([3, 1])
u = np.array([-1, 2])
z
# Draw v, u, z in different colors
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r', label='v')
plt.quiver(0, 0, u[0], u[1], angles='xy', scale_units='xy', scale=1, color='b', label='u')
plt.quiver(0, 0, z[0], z[1], angles='xy', scale_units='xy', scale=1, color='g', label='z')
plt.quiver(
-2, 4)
plt.xlim(-2, 4)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
Now you’ll see three arrows starting at the same point, each pointing in a different direction.
Try It Yourself
- Change
v
to[5, 0]
. What does the arrow look like now? - Try a vector like
[0, -3]
. Which axis does it line up with? - Make a new vector
q = np.array([2, 0, 0])
. What happens if you try to plot it withplt.quiver
in 2D?
3. Vector Addition and Scalar Multiplication
In this lab, we’ll explore the two most fundamental operations you can perform with vectors: adding them together and scaling them by a number (a scalar). These operations form the basis of everything else in linear algebra, from geometry to machine learning. Understanding how they work, both in code and visually, is key to building intuition.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Vector addition When you add two vectors, you simply add their components one by one.
= np.array([2, 3])
v = np.array([1, -1])
u
= v + u
sum_vector print("v + u =", sum_vector)
v + u = [3 2]
Here, (2,3) + (1,-1) = (3,2)
.
- Visualizing vector addition (tip-to-tail method) Graphically, vector addition means placing the tail of one vector at the head of the other. The resulting vector goes from the start of the first to the end of the second.
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r', label='v')
plt.quiver(0], v[1], u[0], u[1], angles='xy', scale_units='xy', scale=1, color='b', label='u placed at end of v')
plt.quiver(v[0, 0, sum_vector[0], sum_vector[1], angles='xy', scale_units='xy', scale=1, color='g', label='v + u')
plt.quiver(
-1, 5)
plt.xlim(-2, 5)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
The green arrow is the result of adding v
and u
.
- Scalar multiplication Multiplying a vector by a scalar stretches or shrinks it. If the scalar is negative, the vector flips direction.
= 2
c = c * v
scaled_v print("2 * v =", scaled_v)
= -1
d = d * v
scaled_v_neg print("-1 * v =", scaled_v_neg)
2 * v = [4 6]
-1 * v = [-2 -3]
So 2 * (2,3) = (4,6)
and -1 * (2,3) = (-2,-3)
.
- Visualizing scalar multiplication
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r', label='v')
plt.quiver(0, 0, scaled_v[0], scaled_v[1], angles='xy', scale_units='xy', scale=1, color='b', label='2 * v')
plt.quiver(0, 0, scaled_v_neg[0], scaled_v_neg[1], angles='xy', scale_units='xy', scale=1, color='g', label='-1 * v')
plt.quiver(
-5, 5)
plt.xlim(-5, 7)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
Here, the blue arrow is twice as long as the red arrow, while the green arrow points in the opposite direction.
- Combining both operations We can scale vectors and then add them. This is called a linear combination (and it’s the foundation for the next section).
= 3*v + (-2)*u
combo print("3*v - 2*u =", combo)
3*v - 2*u = [ 4 11]
Try It Yourself
- Replace
c = 2
withc = 0.5
. What happens to the vector? - Try adding three vectors:
v + u + np.array([-1,2])
. Can you predict the result before printing? - Visualize
3*v + 2*u
using arrows. How does it compare to justv + u
?
4. Linear Combinations and Span
Now that we know how to add vectors and scale them, we can combine these two moves to create linear combinations. A linear combination is just a recipe: multiply vectors by scalars, then add them together. The set of all possible results you can get from such recipes is called the span.
This idea is powerful because span tells us what directions and regions of space we can reach using given vectors.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Linear combinations in Python
= np.array([2, 1])
v = np.array([1, 3])
u
= 2*v + 3*u
combo1 = -1*v + 4*u
combo2
print("2*v + 3*u =", combo1)
print("-v + 4*u =", combo2)
2*v + 3*u = [ 7 11]
-v + 4*u = [ 2 11]
Here, we multiplied and added vectors using scalars. Each result is a new vector.
- Visualizing linear combinations Let’s plot
v
,u
, and their combinations.
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r', label='v')
plt.quiver(0, 0, u[0], u[1], angles='xy', scale_units='xy', scale=1, color='b', label='u')
plt.quiver(0, 0, combo1[0], combo1[1], angles='xy', scale_units='xy', scale=1, color='g', label='2v + 3u')
plt.quiver(0, 0, combo2[0], combo2[1], angles='xy', scale_units='xy', scale=1, color='m', label='-v + 4u')
plt.quiver(
-5, 10)
plt.xlim(-5, 10)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
This shows how new arrows can be generated from scaling and adding the original ones.
- Exploring the span The span of two 2D vectors is either:
- A line (if one is a multiple of the other).
- The whole 2D plane (if they are independent).
# Generate many combinations
= range(-5, 6)
coeffs = []
points for a in coeffs:
for b in coeffs:
= a*v + b*u
point
points.append(point)
= np.array(points)
points
0], points[:,1], s=10, color='gray')
plt.scatter(points[:,0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(0, 0, u[0], u[1], angles='xy', scale_units='xy', scale=1, color='b')
plt.quiver(
-10, 10)
plt.xlim(-10, 10)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
The gray dots show all reachable points with combinations of v
and u
.
- Special case: dependent vectors
= np.array([4, 2]) # notice w = 2*v
w = range(-5, 6)
coeffs = []
points for a in coeffs:
for b in coeffs:
*v + b*w)
points.append(a
= np.array(points)
points
0], points[:,1], s=10, color='gray')
plt.scatter(points[:,0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(0, 0, w[0], w[1], angles='xy', scale_units='xy', scale=1, color='b')
plt.quiver(
-10, 10)
plt.xlim(-10, 10)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
Here, the span collapses to a line because w
is just a scaled copy of v
.
Try It Yourself
- Replace
u = [1,3]
withu = [-1,2]
. What does the span look like? - Try three vectors in 2D (e.g.,
v, u, w
). Do you get more than the whole plane? - Experiment with 3D vectors. Use
np.array([x,y,z])
and check whether different vectors span a plane or all of space.
5. Length (Norm) and Distance
In this lab, we’ll measure how big a vector is (its length, also called its norm) and how far apart two vectors are (their distance). These ideas connect algebra to geometry: when we compute a norm, we’re measuring the size of an arrow; when we compute a distance, we’re measuring the gap between two points in space.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Vector length (norm) in 2D The length of a vector is computed using the Pythagorean theorem. For a vector
(x, y)
, the length issqrt(x² + y²)
.
= np.array([3, 4])
v = np.linalg.norm(v)
length print("Length of v =", length)
Length of v = 5.0
This prints 5.0
, because (3,4)
forms a right triangle with sides 3 and 4, and sqrt(3²+4²)=5
.
- Manual calculation vs NumPy
= (v[0]**2 + v[1]**2)**0.5
manual_length print("Manual length =", manual_length)
print("NumPy length =", np.linalg.norm(v))
Manual length = 5.0
NumPy length = 5.0
Both give the same result.
- Visualizing vector length
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r')
plt.quiver(0, 5)
plt.xlim(0, 5)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0]/2, v[1]/2, f"Length={length}", fontsize=10, color='blue')
plt.text(v[
plt.grid() plt.show()
You’ll see the arrow (3,4)
with its length labeled.
- Distance between two vectors The distance between
v
and another vectoru
is the length of their difference:‖v - u‖
.
= np.array([0, 0]) # the origin
u = np.linalg.norm(v - u)
dist print("Distance between v and u =", dist)
Distance between v and u = 5.0
Since u
is the origin, this is just the length of v
.
- A more interesting distance
= np.array([1, 1])
u = np.linalg.norm(v - u)
dist print("Distance between v and u =", dist)
Distance between v and u = 3.605551275463989
This measures how far (3,4)
is from (1,1)
.
- Visualizing distance between points
0], u[0]], [v[1], u[1]], color=['red','blue'])
plt.scatter([v[0], u[0]], [v[1], u[1]], 'k--')
plt.plot([v[0], v[1], 'v', fontsize=12, color='red')
plt.text(v[0], u[1], 'u', fontsize=12, color='blue')
plt.text(u[
plt.grid() plt.show()
The dashed line shows the distance between the two points.
- Higher-dimensional vectors Norms and distances work the same way in any dimension:
= np.array([1,2,3])
a = np.array([4,0,8])
b print("‖a‖ =", np.linalg.norm(a))
print("‖b‖ =", np.linalg.norm(b))
print("Distance between a and b =", np.linalg.norm(a-b))
‖a‖ = 3.7416573867739413
‖b‖ = 8.94427190999916
Distance between a and b = 6.164414002968976
Even though we can’t draw 3D easily on paper, the formulas still apply.
Try It Yourself
- Compute the length of
np.array([5,12])
. What do you expect? - Find the distance between
(2,3)
and(7,7)
. Can you sketch it by hand and check? - In 3D, try vectors
(1,1,1)
and(2,2,2)
. Why is the distance exactlysqrt(3)
?
6. Dot Product
The dot product is one of the most important operations in linear algebra. It takes two vectors and gives you a single number. That number combines both the lengths of the vectors and how much they point in the same direction. In this lab, we’ll calculate dot products in several ways, see how they relate to geometry, and visualize their meaning.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Algebraic definition The dot product of two vectors is the sum of the products of their components:
= np.array([2, 3])
v = np.array([4, -1])
u
= v[0]*u[0] + v[1]*u[1]
dot_manual = np.dot(v, u)
dot_numpy
print("Manual dot product:", dot_manual)
print("NumPy dot product:", dot_numpy)
Manual dot product: 5
NumPy dot product: 5
Here, (2*4) + (3*-1) = 8 - 3 = 5
.
- Geometric definition The dot product also equals the product of the lengths of the vectors times the cosine of the angle between them:
\[ v \cdot u = \|v\| \|u\| \cos \theta \]
We can compute the angle:
= np.linalg.norm(v)
norm_v = np.linalg.norm(u)
norm_u
= np.dot(v, u) / (norm_v * norm_u)
cos_theta = np.arccos(cos_theta)
theta
print("cos(theta) =", cos_theta)
print("theta (in radians) =", theta)
print("theta (in degrees) =", np.degrees(theta))
cos(theta) = 0.33633639699815626
theta (in radians) = 1.2277723863741932
theta (in degrees) = 70.3461759419467
This gives the angle between v
and u
.
- Visualizing the dot product Let’s draw the two vectors:
0,0,v[0],v[1],angles='xy',scale_units='xy',scale=1,color='r',label='v')
plt.quiver(0,0,u[0],u[1],angles='xy',scale_units='xy',scale=1,color='b',label='u')
plt.quiver(-1,5)
plt.xlim(-2,4)
plt.ylim(0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
The dot product is positive if the angle is less than 90°, negative if greater than 90°, and zero if the vectors are perpendicular.
- Projections and dot product The dot product lets us compute how much of one vector lies in the direction of another.
= np.dot(v, u) / np.linalg.norm(u)
proj_length print("Projection length of v onto u:", proj_length)
Projection length of v onto u: 1.212678125181665
This is the length of the shadow of v
onto u
.
- Special cases
- If vectors point in the same direction, the dot product is large and positive.
- If vectors are perpendicular, the dot product is zero.
- If vectors point in opposite directions, the dot product is negative.
= np.array([1,0])
a = np.array([0,1])
b = np.array([-1,0])
c
print("a · b =", np.dot(a,b)) # perpendicular
print("a · a =", np.dot(a,a)) # length squared
print("a · c =", np.dot(a,c)) # opposite
a · b = 0
a · a = 1
a · c = -1
Try It Yourself
- Compute the dot product of
(3,4)
with(4,3)
. Is the result larger or smaller than the product of their lengths? - Try
(1,2,3) · (4,5,6)
. Does the geometric meaning still work in 3D? - Create two perpendicular vectors (e.g.
(2,0)
and(0,5)
). Verify the dot product is zero.
7. Angles Between Vectors and Cosine
In this lab, we’ll go deeper into the connection between vectors and geometry by calculating angles. Angles tell us how much two vectors “point in the same direction.” The bridge between algebra and geometry here is the cosine formula, which comes directly from the dot product.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Formula for the angle The angle \(\theta\) between two vectors \(v\) and \(u\) is given by:
\[ \cos \theta = \frac{v \cdot u}{\|v\| \, \|u\|} \]
This means:
- If \(\cos \theta = 1\), the vectors point in exactly the same direction.
- If \(\cos \theta = 0\), they are perpendicular.
- If \(\cos \theta = -1\), they point in opposite directions.
- Computing the angle in Python
= np.array([2, 3])
v = np.array([3, -1])
u
= np.dot(v, u)
dot = np.linalg.norm(v)
norm_v = np.linalg.norm(u)
norm_u
= dot / (norm_v * norm_u)
cos_theta = np.arccos(cos_theta)
theta
print("cos(theta) =", cos_theta)
print("theta in radians =", theta)
print("theta in degrees =", np.degrees(theta))
cos(theta) = 0.2631174057921088
theta in radians = 1.3045442776439713
theta in degrees = 74.74488129694222
This gives both the cosine value and the actual angle.
- Visualizing the vectors
0,0,v[0],v[1],angles='xy',scale_units='xy',scale=1,color='r',label='v')
plt.quiver(0,0,u[0],u[1],angles='xy',scale_units='xy',scale=1,color='b',label='u')
plt.quiver(
-1,4)
plt.xlim(-2,4)
plt.ylim(0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
You can see the angle between v
and u
as the gap between the red and blue arrows.
- Checking special cases
= np.array([1,0])
a = np.array([0,1])
b = np.array([-1,0])
c
print("Angle between a and b =", np.degrees(np.arccos(np.dot(a,b)/(np.linalg.norm(a)*np.linalg.norm(b)))))
print("Angle between a and c =", np.degrees(np.arccos(np.dot(a,c)/(np.linalg.norm(a)*np.linalg.norm(c)))))
Angle between a and b = 90.0
Angle between a and c = 180.0
- Angle between
(1,0)
and(0,1)
is 90°. - Angle between
(1,0)
and(-1,0)
is 180°.
- Using cosine as a similarity measure In data science and machine learning, people often use cosine similarity instead of raw angles. It’s just the cosine value itself:
= np.dot(v,u)/(np.linalg.norm(v)*np.linalg.norm(u))
cosine_similarity print("Cosine similarity =", cosine_similarity)
Cosine similarity = 0.2631174057921088
Values close to 1
mean vectors are aligned, values near 0
mean unrelated, and values near -1
mean opposite.
Try It Yourself
- Create two random vectors with
np.random.randn(3)
and compute the angle between them. - Verify that swapping the vectors gives the same angle (symmetry).
- Find two vectors where cosine similarity is exactly
0
. Can you come up with an example in 2D?
8. Projections and Decompositions
In this lab, we’ll learn how to split one vector into parts: one part that lies along another vector, and one part that is perpendicular. This process is called projection and decomposition. Projections let us measure “how much of a vector points in a given direction,” and decompositions give us a way to break vectors into useful components.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Projection formula The projection of vector \(v\) onto vector \(u\) is:
\[ \text{proj}_u(v) = \frac{v \cdot u}{u \cdot u} \, u \]
This gives the component of \(v\) that points in the direction of \(u\).
- Computing projection in Python
= np.array([3, 2])
v = np.array([2, 0])
u
= (np.dot(v, u) / np.dot(u, u)) * u
proj_u_v print("Projection of v onto u:", proj_u_v)
Projection of v onto u: [3. 0.]
Here, \(v = (3,2)\) and \(u = (2,0)\). The projection of v
onto u
is a vector pointing along the x-axis.
- Decomposing into parallel and perpendicular parts
We can write:
\[ v = \text{proj}_u(v) + (v - \text{proj}_u(v)) \]
The first part is parallel to u
, the second part is perpendicular.
= v - proj_u_v
perp print("Parallel part:", proj_u_v)
print("Perpendicular part:", perp)
Parallel part: [3. 0.]
Perpendicular part: [0. 2.]
- Visualizing projection and decomposition
0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='r', label='v')
plt.quiver(0, 0, u[0], u[1], angles='xy', scale_units='xy', scale=1, color='b', label='u')
plt.quiver(0, 0, proj_u_v[0], proj_u_v[1], angles='xy', scale_units='xy', scale=1, color='g', label='proj_u(v)')
plt.quiver(0], proj_u_v[1], perp[0], perp[1], angles='xy', scale_units='xy', scale=1, color='m', label='perpendicular')
plt.quiver(proj_u_v[
-1, 5)
plt.xlim(-1, 4)
plt.ylim(0, color='black', linewidth=0.5)
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
You’ll see v
(red), u
(blue), the projection (green), and the perpendicular remainder (magenta).
- Projection in higher dimensions
This formula works in any dimension:
= np.array([1,2,3])
a = np.array([0,1,0])
b
= (np.dot(a,b)/np.dot(b,b)) * b
proj = a - proj
perp
print("Projection of a onto b:", proj)
print("Perpendicular component:", perp)
Projection of a onto b: [0. 2. 0.]
Perpendicular component: [1. 0. 3.]
Even in 3D or higher, projections are about splitting into “along” and “across.”
Try It Yourself
- Try projecting
(2,3)
onto(0,5)
. Where does it land? - Take a 3D vector like
(4,2,6)
and project it onto(1,0,0)
. What does this give you? - Change the base vector
u
to something not aligned with the axes, like(1,1)
. Does the projection still work?
9. Cauchy–Schwarz and Triangle Inequalities
This lab introduces two fundamental inequalities in linear algebra. They may look abstract at first, but they provide guarantees that always hold true for vectors. We’ll explore them with small examples in Python to see why they matter.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Cauchy–Schwarz inequality
The inequality states:
\[ |v \cdot u| \leq \|v\| \, \|u\| \]
It means the dot product is never “bigger” than the product of the vector lengths. Equality happens only if the two vectors are pointing in exactly the same (or opposite) direction.
= np.array([3, 4])
v = np.array([1, 2])
u
= abs(np.dot(v, u))
lhs = np.linalg.norm(v) * np.linalg.norm(u)
rhs
print("Left-hand side (|v·u|):", lhs)
print("Right-hand side (‖v‖‖u‖):", rhs)
print("Inequality holds?", lhs <= rhs)
Left-hand side (|v·u|): 11
Right-hand side (‖v‖‖u‖): 11.180339887498949
Inequality holds? True
- Testing Cauchy–Schwarz with different vectors
= [
pairs 1,0]), np.array([0,1])), # perpendicular
(np.array([2,3]), np.array([4,6])), # multiples
(np.array([-1,2]), np.array([3,-6])) # opposite multiples
(np.array([
]
for v,u in pairs:
= abs(np.dot(v, u))
lhs = np.linalg.norm(v) * np.linalg.norm(u)
rhs print(f"v={v}, u={u} -> |v·u|={lhs}, ‖v‖‖u‖={rhs}, holds={lhs<=rhs}")
v=[1 0], u=[0 1] -> |v·u|=0, ‖v‖‖u‖=1.0, holds=True
v=[2 3], u=[4 6] -> |v·u|=26, ‖v‖‖u‖=25.999999999999996, holds=False
v=[-1 2], u=[ 3 -6] -> |v·u|=15, ‖v‖‖u‖=15.000000000000002, holds=True
- Perpendicular vectors give
|v·u| = 0
, far less than the product of norms. - Multiples give equality (
lhs = rhs
).
- Triangle inequality
The triangle inequality states:
\[ \|v + u\| \leq \|v\| + \|u\| \]
Geometrically, the length of one side of a triangle can never be longer than the sum of the other two sides.
= np.array([3, 4])
v = np.array([1, 2])
u
= np.linalg.norm(v + u)
lhs = np.linalg.norm(v) + np.linalg.norm(u)
rhs
print("‖v+u‖ =", lhs)
print("‖v‖ + ‖u‖ =", rhs)
print("Inequality holds?", lhs <= rhs)
‖v+u‖ = 7.211102550927978
‖v‖ + ‖u‖ = 7.23606797749979
Inequality holds? True
- Visual demonstration with a triangle
import matplotlib.pyplot as plt
= np.array([0,0])
origin = np.array([origin, v, v+u, origin])
points
0], points[:,1], 'ro-') # triangle outline
plt.plot(points[:,0], v[1], 'v')
plt.text(v[0]+u[0], v[1]+u[1], 'v+u')
plt.text(v[0], u[1], 'u')
plt.text(u[
plt.grid()0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline('equal')
plt.axis( plt.show()
This triangle shows why the inequality is called the “triangle” inequality.
- Testing triangle inequality with random vectors
for _ in range(5):
= np.random.randn(2)
v = np.random.randn(2)
u = np.linalg.norm(v+u)
lhs = np.linalg.norm(v) + np.linalg.norm(u)
rhs print(f"‖v+u‖={lhs:.3f}, ‖v‖+‖u‖={rhs:.3f}, holds={lhs <= rhs}")
‖v+u‖=0.778, ‖v‖+‖u‖=2.112, holds=True
‖v+u‖=1.040, ‖v‖+‖u‖=2.621, holds=True
‖v+u‖=1.632, ‖v‖+‖u‖=2.482, holds=True
‖v+u‖=1.493, ‖v‖+‖u‖=2.250, holds=True
‖v+u‖=2.653, ‖v‖+‖u‖=2.692, holds=True
No matter what vectors you try, the inequality always holds.
The Takeaway
- Cauchy–Schwarz: The dot product is always bounded by the product of vector lengths.
- Triangle inequality: The length of one side of a triangle can’t exceed the sum of the other two.
- These inequalities form the backbone of geometry, analysis, and many proofs in linear algebra.
10. Orthonormal Sets in ℝ²/ℝ³
In this lab, we’ll explore orthonormal sets - collections of vectors that are both orthogonal (perpendicular) and normalized (length = 1). These sets are the “nicest” possible bases for vector spaces. In 2D and 3D, they correspond to the coordinate axes we already know, but we can also construct and test new ones.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Orthogonal vectors Two vectors are orthogonal if their dot product is zero.
= np.array([1, 0])
x_axis = np.array([0, 1])
y_axis
print("x_axis · y_axis =", np.dot(x_axis, y_axis)) # should be 0
x_axis · y_axis = 0
So the standard axes are orthogonal.
- Normalizing vectors Normalization means dividing a vector by its length to make its norm equal to 1.
= np.array([3, 4])
v = v / np.linalg.norm(v)
v_normalized
print("Original v:", v)
print("Normalized v:", v_normalized)
print("Length of normalized v:", np.linalg.norm(v_normalized))
Original v: [3 4]
Normalized v: [0.6 0.8]
Length of normalized v: 1.0
Now v_normalized
points in the same direction as v
but has unit length.
- Building an orthonormal set in 2D
= np.array([1, 0])
u1 = np.array([0, 1])
u2
print("u1 length:", np.linalg.norm(u1))
print("u2 length:", np.linalg.norm(u2))
print("u1 · u2 =", np.dot(u1,u2))
u1 length: 1.0
u2 length: 1.0
u1 · u2 = 0
Both have length 1, and their dot product is 0. That makes {u1, u2}
an orthonormal set in 2D.
- Visualizing 2D orthonormal vectors
0,0,u1[0],u1[1],angles='xy',scale_units='xy',scale=1,color='r')
plt.quiver(0,0,u2[0],u2[1],angles='xy',scale_units='xy',scale=1,color='b')
plt.quiver(
-1.5,1.5)
plt.xlim(-1.5,1.5)
plt.ylim(0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
You’ll see the red and blue arrows at right angles, each of length 1.
- Orthonormal set in 3D In 3D, the standard basis vectors are:
= np.array([1,0,0])
i = np.array([0,1,0])
j = np.array([0,0,1])
k
print("‖i‖ =", np.linalg.norm(i))
print("‖j‖ =", np.linalg.norm(j))
print("‖k‖ =", np.linalg.norm(k))
print("i·j =", np.dot(i,j))
print("j·k =", np.dot(j,k))
print("i·k =", np.dot(i,k))
‖i‖ = 1.0
‖j‖ = 1.0
‖k‖ = 1.0
i·j = 0
j·k = 0
i·k = 0
Lengths are all 1, and dot products are 0. So {i, j, k}
is an orthonormal set in ℝ³.
- Testing if a set is orthonormal We can write a helper function:
def is_orthonormal(vectors):
for i in range(len(vectors)):
for j in range(len(vectors)):
= np.dot(vectors[i], vectors[j])
dot if i == j:
if not np.isclose(dot, 1): return False
else:
if not np.isclose(dot, 0): return False
return True
print(is_orthonormal([i, j, k])) # True
True
- Constructing a new orthonormal pair Not all orthonormal sets look like the axes.
= np.array([1,1]) / np.sqrt(2)
u1 = np.array([-1,1]) / np.sqrt(2)
u2
print("u1·u2 =", np.dot(u1,u2))
print("‖u1‖ =", np.linalg.norm(u1))
print("‖u2‖ =", np.linalg.norm(u2))
u1·u2 = 0.0
‖u1‖ = 0.9999999999999999
‖u2‖ = 0.9999999999999999
This gives a rotated orthonormal basis in 2D.
Try It Yourself
- Normalize
(2,2,1)
to make it a unit vector. - Test whether the set
{[1,0,0], [0,2,0], [0,0,3]}
is orthonormal. - Construct two vectors in 2D that are not perpendicular. Normalize them and check if the dot product is still zero.
Chapter 2. Matrices and basic operations
11. Matrices as Tables and as Machines
Matrices can feel mysterious at first, but there are two simple ways to think about them:
- As tables of numbers - just a grid you can store and manipulate.
- As machines - something that takes a vector in and spits a new vector out.
In this lab, we’ll explore both views and see how they connect.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- A matrix as a table of numbers
= np.array([
A 1, 2, 3],
[4, 5, 6]
[
])
print("Matrix A:\n", A)
print("Shape of A:", A.shape)
Matrix A:
[[1 2 3]
[4 5 6]]
Shape of A: (2, 3)
Here, A
is a 2×3 matrix (2 rows, 3 columns).
- Rows = horizontal slices →
[1,2,3]
and[4,5,6]
- Columns = vertical slices →
[1,4]
,[2,5]
,[3,6]
- Accessing rows and columns
= A[0] # row 0
first_row = A[:,1] # column 1
second_column
print("First row:", first_row)
print("Second column:", second_column)
First row: [1 2 3]
Second column: [2 5]
Rows are whole vectors too, and so are columns.
- A matrix as a machine
A matrix can “act” on a vector. If x = [x1, x2, x3]
, then A·x
is computed by taking linear combinations of the columns of A
.
= np.array([1, 0, -1]) # a 3D vector
x = A.dot(x)
result
print("A·x =", result)
A·x = [-2 -2]
Interpretation: multiply A
by x
= combine columns of A
with weights from x
.
\[ A \cdot x = 1 \cdot \text{(col 1)} + 0 \cdot \text{(col 2)} + (-1) \cdot \text{(col 3)} \]
- Verifying column combination view
= A[:,0]
col1 = A[:,1]
col2 = A[:,2]
col3
= 1*col1 + 0*col2 + (-1)*col3
manual print("Manual combination:", manual)
print("A·x result:", result)
Manual combination: [-2 -2]
A·x result: [-2 -2]
They match exactly. This shows the “machine” interpretation is just a shortcut for column combinations.
- Geometric intuition (2D example)
= np.array([
B 2, 0],
[0, 1]
[
])
= np.array([1,2])
v print("B·v =", B.dot(v))
B·v = [2 2]
Here, B
scales the x-direction by 2 while leaving the y-direction alone. So (1,2)
becomes (2,2)
.
Try It Yourself
- Create a 3×3 identity matrix with
np.eye(3)
and multiply it by different vectors. What happens? - Build a matrix
[[0,-1],[1,0]]
. Try multiplying it by(1,0)
and(0,1)
. What transformation is this? - Create your own 2×2 matrix that flips vectors across the x-axis. Test it on
(1,2)
and(−3,4)
.
The Takeaway
- A matrix is both a grid of numbers and a machine that transforms vectors.
- Matrix–vector multiplication is the same as combining columns with given weights.
- Thinking of matrices as machines helps build intuition for rotations, scalings, and other transformations later.
12. Matrix Shapes, Indexing, and Block Views
Matrices come in many shapes, and learning to read their structure is essential. Shape tells us how many rows and columns a matrix has. Indexing lets us grab specific entries, rows, or columns. Block views let us zoom in on submatrices, which is extremely useful for both theory and computation.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Matrix shapes
The shape of a matrix is (rows, columns)
.
= np.array([
A 1, 2, 3],
[4, 5, 6],
[7, 8, 9]
[
])
print("Matrix A:\n", A)
print("Shape of A:", A.shape)
Matrix A:
[[1 2 3]
[4 5 6]
[7 8 9]]
Shape of A: (3, 3)
Here, A
is a 3×3 matrix.
- Indexing elements
In NumPy, rows and columns are 0-based. The first entry is A[0,0]
.
print("A[0,0] =", A[0,0]) # top-left element
print("A[1,2] =", A[1,2]) # second row, third column
A[0,0] = 1
A[1,2] = 6
- Extracting rows and columns
= A[0] # first row
row1 = A[:,1] # second column
col2
print("First row:", row1)
print("Second column:", col2)
First row: [1 2 3]
Second column: [2 5 8]
Notice: A[i]
gives a row, A[:,j]
gives a column.
- Slicing submatrices (block view)
You can slice multiple rows and columns to form a smaller matrix.
= A[0:2, 1:3] # rows 0–1, columns 1–2
block print("Block submatrix:\n", block)
Block submatrix:
[[2 3]
[5 6]]
This block is:
\[ \begin{bmatrix} 2 & 3 \\ 5 & 6 \end{bmatrix} \]
- Modifying parts of a matrix
0,0] = 99
A[print("Modified A:\n", A)
1,:] = [10, 11, 12] # replace row 1
A[print("After replacing row 1:\n", A)
Modified A:
[[99 2 3]
[ 4 5 6]
[ 7 8 9]]
After replacing row 1:
[[99 2 3]
[10 11 12]
[ 7 8 9]]
- Non-square matrices
Not all matrices are square. Shapes can be rectangular, too.
= np.array([
B 1, 2],
[3, 4],
[5, 6]
[
])
print("Matrix B:\n", B)
print("Shape of B:", B.shape)
Matrix B:
[[1 2]
[3 4]
[5 6]]
Shape of B: (3, 2)
Here, B
is 3×2 (3 rows, 2 columns).
- Block decomposition idea
We can think of large matrices as made of smaller blocks. This is common in linear algebra proofs and algorithms.
= np.array([
C 1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]
[
])
= C[0:2, 0:2]
top_left = C[2:4, 2:4]
bottom_right
print("Top-left block:\n", top_left)
print("Bottom-right block:\n", bottom_right)
Top-left block:
[[1 2]
[5 6]]
Bottom-right block:
[[11 12]
[15 16]]
This is the start of block matrix notation.
Try It Yourself
- Create a 4×5 matrix with values 1–20 using
np.arange(1,21).reshape(4,5)
. Find its shape. - Extract the middle row and last column.
- Cut it into four 2×2 blocks. Can you reassemble them in a different order?
13. Matrix Addition and Scalar Multiplication
Now that we understand matrix shapes and indexing, let’s practice two of the simplest but most important operations: adding matrices and scaling them with numbers (scalars). These operations extend the rules we already know for vectors.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Adding two matrices You can add two matrices if (and only if) they have the same shape. Addition happens entry by entry.
= np.array([
A 1, 2],
[3, 4]
[
])
= np.array([
B 5, 6],
[7, 8]
[
])
= A + B
C print("A + B =\n", C)
A + B =
[[ 6 8]
[10 12]]
Each element in C
is the sum of corresponding elements in A
and B
.
- Scalar multiplication Multiplying a matrix by a scalar multiplies every entry by that number.
= 3
k = k * A
D print("3 * A =\n", D)
3 * A =
[[ 3 6]
[ 9 12]]
Here, each element of A
is tripled.
- Combining both operations We can mix addition and scaling, just like with vectors.
= 2*A - B
combo print("2A - B =\n", combo)
2A - B =
[[-3 -2]
[-1 0]]
This creates new matrices as linear combinations of others.
- Zero matrix A matrix of all zeros acts like “nothing happens” for addition.
= np.zeros((2,2))
zero print("Zero matrix:\n", zero)
print("A + Zero =\n", A + zero)
Zero matrix:
[[0. 0.]
[0. 0.]]
A + Zero =
[[1. 2.]
[3. 4.]]
- Shape mismatch (what fails) If shapes don’t match, NumPy throws an error.
= np.array([
X 1,2,3],
[4,5,6]
[
])
try:
print(A + X)
except ValueError as e:
print("Error:", e)
Error: operands could not be broadcast together with shapes (2,2) (2,3)
This shows why shape consistency matters.
Try It Yourself
- Create two random 3×3 matrices with
np.random.randint(0,10,(3,3))
and add them. - Multiply a 4×4 matrix by
-1
. What happens to its entries? - Compute
3A + 2B
with the matrices from above. Compare with doing each step manually.
14. Matrix–Vector Product (Linear Combinations of Columns)
This lab introduces the matrix–vector product, one of the most important operations in linear algebra. Multiplying a matrix by a vector doesn’t just crunch numbers - it produces a new vector by combining the matrix’s columns in a weighted way.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- A simple matrix and vector
= np.array([
A 1, 2],
[3, 4],
[5, 6]
[# 3×2 matrix
])
= np.array([2, -1]) # 2D vector x
Here, A
has 2 columns, so we can multiply it by a 2D vector x
.
- Matrix–vector multiplication in NumPy
= A.dot(x)
y print("A·x =", y)
A·x = [0 2 4]
Result: a 3D vector.
- Interpreting the result as linear combinations
Matrix A
has two columns:
= A[:,0] # first column
col1 = A[:,1] # second column
col2
= 2*col1 + (-1)*col2
manual print("Manual linear combination:", manual)
Manual linear combination: [0 2 4]
This matches A·x
. In words: multiply each column by the corresponding entry of x
and then add them up.
- Another example (geometry)
= np.array([
B 2, 0],
[0, 1]
[# stretches x-axis by 2
])
= np.array([1, 3])
v print("B·v =", B.dot(v))
B·v = [2 3]
Here, (1,3)
becomes (2,3)
. The x-component was doubled, while y stayed the same.
- Visualization of matrix action
import matplotlib.pyplot as plt
# draw original vector
0,0,v[0],v[1],angles='xy',scale_units='xy',scale=1,color='r',label='v')
plt.quiver(
# draw transformed vector
= B.dot(v)
v_transformed 0,0,v_transformed[0],v_transformed[1],angles='xy',scale_units='xy',scale=1,color='b',label='B·v')
plt.quiver(
-1,4)
plt.xlim(-1,4)
plt.ylim(0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
Red arrow = original vector, blue arrow = transformed vector.
Try It Yourself
Multiply
\[ A = \begin{bmatrix}1 & 0 \\ 0 & 1 \\ -1 & 2\end{bmatrix},\; x = [3,1] \]
What’s the result?
Replace
B
with[[0,-1],[1,0]]
. Multiply it by(1,0)
and(0,1)
. What geometric transformation does this represent?For a 4×4 identity matrix (
np.eye(4)
), try multiplying by any 4D vector. What do you observe?
15. Matrix–Matrix Product (Composition of Linear Steps)
Matrix–matrix multiplication is how we combine two linear transformations into one. Instead of applying one transformation, then another, we can multiply their matrices and get a single matrix that does both at once.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Matrix–matrix multiplication in NumPy
= np.array([
A 1, 2],
[3, 4]
[# 2×2
])
= np.array([
B 2, 0],
[1, 2]
[# 2×2
])
= A.dot(B) # or A @ B
C print("A·B =\n", C)
A·B =
[[ 4 4]
[10 8]]
The result C
is another 2×2 matrix.
- Manual computation
Each entry of C
is computed as a row of A dotted with a column of B:
= A[0,:].dot(B[:,0])
c11 = A[0,:].dot(B[:,1])
c12 = A[1,:].dot(B[:,0])
c21 = A[1,:].dot(B[:,1])
c22
print("Manual C =\n", np.array([[c11,c12],[c21,c22]]))
Manual C =
[[ 4 4]
[10 8]]
This should match A·B
.
- Geometric interpretation
Let’s see how two transformations combine.
- Matrix
B
scales x by 2 and stretches y by 2. - Matrix
A
applies another linear transformation.
Together, C = A·B
does both in one step.
= np.array([1,1])
v
print("First apply B:", B.dot(v))
print("Then apply A:", A.dot(B.dot(v)))
print("Directly with C:", C.dot(v))
First apply B: [2 3]
Then apply A: [ 8 18]
Directly with C: [ 8 18]
The result is the same: applying B
then A
is equivalent to applying C
.
- Non-square matrices
Matrix multiplication also works for rectangular matrices, as long as the inner dimensions match.
= np.array([
M 1, 0, 2],
[0, 1, 3]
[# 2×3
])
= np.array([
N 1, 2],
[0, 1],
[4, 0]
[# 3×2
])
= M.dot(N) # result is 2×2
P print("M·N =\n", P)
M·N =
[[ 9 2]
[12 1]]
Shape rule: (2×3)·(3×2) = (2×2)
.
- Associativity (but not commutativity)
Matrix multiplication is associative: (A·B)·C = A·(B·C)
. But it’s not commutative: in general, A·B ≠ B·A
.
= np.array([[1,2],[3,4]])
A = np.array([[0,1],[1,0]])
B
print("A·B =\n", A.dot(B))
print("B·A =\n", B.dot(A))
A·B =
[[2 1]
[4 3]]
B·A =
[[3 4]
[1 2]]
The two results are different.
Try It Yourself
Multiply
\[ A = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix},\; B = \begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix} \]
What transformation does
A·B
represent?Create a random 3×2 matrix and a 2×4 matrix. Multiply them. What shape is the result?
Verify with Python that
(A·B)·C = A·(B·C)
for some 3×3 random matrices.
16. Identity, Inverse, and Transpose
In this lab, we’ll meet three special matrix operations and objects: the identity matrix, the inverse, and the transpose. These are the building blocks of matrix algebra, each with a simple meaning but deep importance.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Identity matrix The identity matrix is like the number
1
for matrices: multiplying by it changes nothing.
= np.eye(3) # 3×3 identity matrix
I print("Identity matrix:\n", I)
= np.array([
A 2, 1, 0],
[0, 1, 3],
[4, 0, 1]
[
])
print("A·I =\n", A.dot(I))
print("I·A =\n", I.dot(A))
Identity matrix:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
A·I =
[[2. 1. 0.]
[0. 1. 3.]
[4. 0. 1.]]
I·A =
[[2. 1. 0.]
[0. 1. 3.]
[4. 0. 1.]]
Both equal A
.
- Transpose The transpose flips rows and columns.
= np.array([
B 1, 2, 3],
[4, 5, 6]
[
])
print("B:\n", B)
print("B.T:\n", B.T)
B:
[[1 2 3]
[4 5 6]]
B.T:
[[1 4]
[2 5]
[3 6]]
- Original: 2×3
- Transpose: 3×2
Geometrically, transpose swaps the axes when vectors are viewed in row/column form.
- Inverse The inverse matrix is like dividing by a number: multiplying a matrix by its inverse gives the identity.
= np.array([
C 2, 1],
[5, 3]
[
])
= np.linalg.inv(C)
C_inv print("Inverse of C:\n", C_inv)
print("C·C_inv =\n", C.dot(C_inv))
print("C_inv·C =\n", C_inv.dot(C))
Inverse of C:
[[ 3. -1.]
[-5. 2.]]
C·C_inv =
[[ 1.00000000e+00 2.22044605e-16]
[-8.88178420e-16 1.00000000e+00]]
C_inv·C =
[[1.00000000e+00 3.33066907e-16]
[0.00000000e+00 1.00000000e+00]]
Both products are (approximately) the identity.
- Matrices that don’t have inverses Not every matrix is invertible. If a matrix is singular (determinant = 0), it has no inverse.
= np.array([
D 1, 2],
[2, 4]
[
])
try:
np.linalg.inv(D)except np.linalg.LinAlgError as e:
print("Error:", e)
Error: Singular matrix
Here, the second row is a multiple of the first, so D
can’t be inverted.
- Transpose and inverse together For invertible matrices,
\[ (A^T)^{-1} = (A^{-1})^T \]
We can check this numerically:
= np.array([
A 1, 2],
[3, 5]
[
])
= np.linalg.inv(A.T)
lhs = np.linalg.inv(A).T
rhs
print("Do they match?", np.allclose(lhs, rhs))
Do they match? True
Try It Yourself
- Create a 4×4 identity matrix. Multiply it by any 4×1 vector. Does it change?
- Take a random 2×2 matrix with
np.random.randint
. Compute its inverse and check if multiplying gives identity. - Pick a rectangular 3×2 matrix. What happens when you try
np.linalg.inv
? Why? - Compute
(A.T).T
for some matrixA
. What do you notice?
17. Symmetric, Diagonal, Triangular, and Permutation Matrices
In this lab, we’ll meet four important families of special matrices. They have patterns that make them easier to understand, compute with, and use in algorithms.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Symmetric matrices A matrix is symmetric if it equals its transpose: \(A = A^T\).
= np.array([
A 2, 3, 4],
[3, 5, 6],
[4, 6, 8]
[
])
print("A:\n", A)
print("A.T:\n", A.T)
print("Is symmetric?", np.allclose(A, A.T))
A:
[[2 3 4]
[3 5 6]
[4 6 8]]
A.T:
[[2 3 4]
[3 5 6]
[4 6 8]]
Is symmetric? True
Symmetric matrices appear in physics, optimization, and statistics (e.g., covariance matrices).
- Diagonal matrices A diagonal matrix has nonzero entries only on the main diagonal.
= np.diag([1, 5, 9])
D print("Diagonal matrix:\n", D)
= np.array([2, 3, 4])
x print("D·x =", D.dot(x)) # scales each component
Diagonal matrix:
[[1 0 0]
[0 5 0]
[0 0 9]]
D·x = [ 2 15 36]
Diagonal multiplication simply scales each coordinate separately.
- Triangular matrices Upper triangular: all entries below the diagonal are zero. Lower triangular: all entries above the diagonal are zero.
= np.array([
U 1, 2, 3],
[0, 4, 5],
[0, 0, 6]
[
])
= np.array([
L 7, 0, 0],
[8, 9, 0],
[1, 2, 3]
[
])
print("Upper triangular U:\n", U)
print("Lower triangular L:\n", L)
Upper triangular U:
[[1 2 3]
[0 4 5]
[0 0 6]]
Lower triangular L:
[[7 0 0]
[8 9 0]
[1 2 3]]
These are important in solving linear systems (e.g., Gaussian elimination).
- Permutation matrices A permutation matrix rearranges the order of coordinates. Each row and each column has exactly one
1
, everything else is0
.
= np.array([
P 0, 1, 0],
[0, 0, 1],
[1, 0, 0]
[
])
print("Permutation matrix P:\n", P)
= np.array([10, 20, 30])
v print("P·v =", P.dot(v))
Permutation matrix P:
[[0 1 0]
[0 0 1]
[1 0 0]]
P·v = [20 30 10]
Here, P
cycles (10,20,30)
into (20,30,10)
.
- Checking properties
def is_symmetric(M): return np.allclose(M, M.T)
def is_diagonal(M): return np.count_nonzero(M - np.diag(np.diag(M))) == 0
def is_upper_triangular(M): return np.allclose(M, np.triu(M))
def is_lower_triangular(M): return np.allclose(M, np.tril(M))
print("A symmetric?", is_symmetric(A))
print("D diagonal?", is_diagonal(D))
print("U upper triangular?", is_upper_triangular(U))
print("L lower triangular?", is_lower_triangular(L))
A symmetric? True
D diagonal? True
U upper triangular? True
L lower triangular? True
Try It Yourself
- Create a random symmetric matrix by generating any matrix
M
and computing(M + M.T)/2
. - Build a 4×4 diagonal matrix with diagonal entries
[2,4,6,8]
and multiply it by[1,1,1,1]
. - Make a permutation matrix that swaps the first and last components of a 3D vector.
- Check whether the identity matrix is diagonal, symmetric, upper triangular, and lower triangular all at once.
18. Trace and Basic Matrix Properties
In this lab, we’ll introduce the trace of a matrix and a few quick properties that often appear in proofs, algorithms, and applications. The trace is simple to compute but surprisingly powerful.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- What is the trace? The trace of a square matrix is the sum of its diagonal entries:
\[ \text{tr}(A) = \sum_i A_{ii} \]
= np.array([
A 2, 1, 3],
[0, 4, 5],
[7, 8, 6]
[
])
= np.trace(A)
trace_A print("Matrix A:\n", A)
print("Trace of A =", trace_A)
Matrix A:
[[2 1 3]
[0 4 5]
[7 8 6]]
Trace of A = 12
Here, trace = \(2 + 4 + 6 = 12\).
- Trace is linear For matrices
A
andB
:
\[ \text{tr}(A+B) = \text{tr}(A) + \text{tr}(B) \]
\[ \text{tr}(cA) = c \cdot \text{tr}(A) \]
= np.array([
B 1, 0, 0],
[0, 2, 0],
[0, 0, 3]
[
])
print("tr(A+B) =", np.trace(A+B))
print("tr(A) + tr(B) =", np.trace(A) + np.trace(B))
print("tr(3A) =", np.trace(3*A))
print("3 * tr(A) =", 3*np.trace(A))
tr(A+B) = 18
tr(A) + tr(B) = 18
tr(3A) = 36
3 * tr(A) = 36
- Trace of a product One important property is:
\[ \text{tr}(AB) = \text{tr}(BA) \]
= np.array([
C 0,1],
[2,3]
[
])
= np.array([
D 4,5],
[6,7]
[
])
print("tr(CD) =", np.trace(C.dot(D)))
print("tr(DC) =", np.trace(D.dot(C)))
tr(CD) = 37
tr(DC) = 37
Both are equal, even though CD
and DC
are different matrices.
- Trace and eigenvalues The trace equals the sum of eigenvalues of a matrix (counting multiplicities).
= np.linalg.eig(A)
vals, vecs print("Eigenvalues:", vals)
print("Sum of eigenvalues =", np.sum(vals))
print("Trace =", np.trace(A))
Eigenvalues: [12.83286783 2.13019807 -2.9630659 ]
Sum of eigenvalues = 12.000000000000007
Trace = 12
The results should match (within rounding error).
- Quick invariants
- Trace doesn’t change under transpose:
tr(A) = tr(A.T)
- Trace doesn’t change under similarity transforms:
tr(P^-1 A P) = tr(A)
print("tr(A) =", np.trace(A))
print("tr(A.T) =", np.trace(A.T))
tr(A) = 12
tr(A.T) = 12
Try It Yourself
Create a 2×2 rotation matrix for 90°:
\[ R = \begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix} \]
What is its trace? What does that tell you about its eigenvalues?
Make a random 3×3 matrix and compare
tr(A)
with the sum of eigenvalues.Test
tr(AB)
andtr(BA)
with a rectangular matrixA
(e.g. 2×3) andB
(3×2). Do they still match?
19. Affine Transforms and Homogeneous Coordinates
Affine transformations let us do more than just linear operations - they include translations (shifting points), which ordinary matrices can’t handle alone. To unify rotations, scalings, reflections, and translations, we use homogeneous coordinates.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Linear transformations vs affine transformations
- A linear transformation can rotate, scale, or shear, but always keeps the origin fixed.
- An affine transformation allows translation as well.
For example, shifting every point by (2,3)
is affine but not linear.
- Homogeneous coordinates idea We add an extra coordinate (usually
1
) to vectors.
- A 2D point
(x,y)
becomes(x,y,1)
. - A 3D point
(x,y,z)
becomes(x,y,z,1)
.
This trick lets us represent translations using matrix multiplication.
- 2D translation matrix
\[ T = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \]
= np.array([
T 1, 0, 2],
[0, 1, 3],
[0, 0, 1]
[
])
= np.array([1, 1, 1]) # point at (1,1)
p = T.dot(p)
p_translated
print("Original point:", p)
print("Translated point:", p_translated)
Original point: [1 1 1]
Translated point: [3 4 1]
This shifts (1,1)
to (3,4)
.
- Combining rotation and translation
A 90° rotation around the origin in 2D:
= np.array([
R 0, -1, 0],
[1, 0, 0],
[0, 0, 1]
[
])
= T.dot(R) # rotate then translate
M print("Combined transform:\n", M)
= np.array([1, 0, 1])
p print("Rotated + translated point:", M.dot(p))
Combined transform:
[[ 0 -1 2]
[ 1 0 3]
[ 0 0 1]]
Rotated + translated point: [2 4 1]
Now we can apply rotation and translation in one step.
- Visualization of translation
= np.array([
points 0,0,1],
[1,0,1],
[1,1,1],
[0,1,1]
[# a unit square
])
= points.dot(T.T)
transformed
0], points[:,1], color='r', label='original')
plt.scatter(points[:,0], transformed[:,1], color='b', label='translated')
plt.scatter(transformed[:,
for i in range(len(points)):
0], points[i,1],
plt.arrow(points[i,0]-points[i,0],
transformed[i,1]-points[i,1],
transformed[i,=0.05, color='gray')
head_width
plt.legend()'equal')
plt.axis(
plt.grid() plt.show()
You’ll see the red unit square moved to a blue unit square shifted by (2,3)
.
- Extending to 3D In 3D, homogeneous coordinates use 4×4 matrices. Translations, rotations, and scalings all fit the same framework.
= np.array([
T3 1,0,0,5],
[0,1,0,-2],
[0,0,1,3],
[0,0,0,1]
[
])
= np.array([1,2,3,1])
p3 print("Translated 3D point:", T3.dot(p3))
Translated 3D point: [6 0 6 1]
This shifts (1,2,3)
to (6,0,6)
.
Try It Yourself
- Build a scaling matrix in homogeneous coordinates that doubles both x and y, and apply it to
(1,1)
. - Create a 2D transform that rotates by 90° and then shifts by
(−2,1)
. Apply it to(0,2)
. - In 3D, translate
(0,0,0)
by(10,10,10)
. What homogeneous matrix did you use?
20. Computing with Matrices (Cost Counts and Simple Speedups)
Working with matrices is not just about theory - in practice, we care about how much computation it takes to perform operations, and how we can make them faster. This lab introduces basic cost analysis (counting operations) and demonstrates simple NumPy optimizations.
Set Up Your Lab
import numpy as np
import time
Step-by-Step Code Walkthrough
- Counting operations (matrix–vector multiply)
If A
is an \(m \times n\) matrix and x
is an \(n\)-dimensional vector, computing A·x
takes about \(m \times n\) multiplications and the same number of additions.
= 3, 4
m, n = np.random.randint(1,10,(m,n))
A = np.random.randint(1,10,n)
x
print("Matrix A:\n", A)
print("Vector x:", x)
print("A·x =", A.dot(x))
Matrix A:
[[6 6 6 2]
[1 1 1 1]
[1 8 7 4]]
Vector x: [6 5 4 5]
A·x = [100 20 94]
Here the cost is \(3 \times 4 = 12\) multiplications + 12 additions.
- Counting operations (matrix–matrix multiply)
For an \(m \times n\) times \(n \times p\) multiplication, the cost is about \(m \times n \times p\).
= 3, 4, 2
m, n, p = np.random.randint(1,10,(m,n))
A = np.random.randint(1,10,(n,p))
B
= A.dot(B)
C print("A·B =\n", C)
A·B =
[[ 59 92]
[ 43 81]
[ 65 102]]
Here the cost is \(3 \times 4 \times 2 = 24\) multiplications + 24 additions.
- Timing with NumPy (vectorized vs loop)
NumPy is optimized in C and Fortran under the hood. Let’s compare matrix multiplication with and without vectorization.
= 50
n = np.random.randn(n,n)
A = np.random.randn(n,n)
B
# Vectorized
= time.time()
start = A.dot(B)
C1 = time.time()
end print("Vectorized dot:", round(end-start,3), "seconds")
# Manual loops
= np.zeros((n,n))
C2 = time.time()
start for i in range(n):
for j in range(n):
for k in range(n):
+= A[i,k]*B[k,j]
C2[i,j] = time.time()
end print("Triple loop:", round(end-start,3), "seconds")
Vectorized dot: 0.0 seconds
Triple loop: 0.026 seconds
The vectorized version should be thousands of times faster.
- Broadcasting tricks
NumPy lets us avoid loops by broadcasting operations across entire rows or columns.
= np.array([
A 1,2,3],
[4,5,6]
[
])
# Add 10 to every entry
print("A+10 =\n", A+10)
# Multiply each row by a different scalar
= np.array([1,10])[:,None]
scales print("Row-scaled A =\n", A*scales)
A+10 =
[[11 12 13]
[14 15 16]]
Row-scaled A =
[[ 1 2 3]
[40 50 60]]
- Memory and data types
For large computations, data type matters.
= np.random.randn(1000,1000).astype(np.float32) # 32-bit floats
A = np.random.randn(1000,1000).astype(np.float32)
B
= time.time()
start = A.dot(B)
C print("Result shape:", C.shape, "dtype:", C.dtype)
print("Time:", round(time.time()-start,3), "seconds")
Result shape: (1000, 1000) dtype: float32
Time: 0.002 seconds
Using float32
instead of float64
halves memory use and can speed up computation (at the cost of some precision).
Try It Yourself
- Compute the cost of multiplying a 200×500 matrix with a 500×1000 matrix. How many multiplications are needed?
- Time matrix multiplication for sizes 100, 500, 1000 in NumPy. How does the time scale?
- Experiment with
float32
vsfloat64
in NumPy. How do speed and memory change? - Try broadcasting: multiply each column of a matrix by
[1,2,3,...]
.
The Takeaway
- Matrix operations have predictable computational costs:
A·x
~ \(m \times n\),A·B
~ \(m \times n \times p\). - Vectorized NumPy operations are vastly faster than Python loops.
- Broadcasting and choosing the right data type are simple speedups every beginner should learn.
Chapter 3. Linear Systems and Elimination
21. From Equations to Matrices (Augmenting and Encoding)
Linear algebra often begins with solving systems of linear equations. For example:
\[ \begin{cases} x + 2y = 5 \\ 3x - y = 4 \end{cases} \]
Instead of juggling symbols, we can encode the entire system into a matrix. This is the key idea that lets computers handle thousands or millions of equations efficiently.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Write a system of equations
We’ll use this small example:
\[ \begin{cases} 2x + y = 8 \\ -3x + 4y = -11 \end{cases} \]
- Encode coefficients and constants
- Coefficient matrix \(A\): numbers multiplying variables.
- Variable vector \(x\): unknowns
[x, y]
. - Constant vector \(b\): right-hand side.
= np.array([
A 2, 1],
[-3, 4]
[
])
= np.array([8, -11])
b
print("Coefficient matrix A:\n", A)
print("Constants vector b:", b)
Coefficient matrix A:
[[ 2 1]
[-3 4]]
Constants vector b: [ 8 -11]
So the system is \(A·x = b\).
- Augmented matrix
We can bundle the system into one compact matrix:
\[ [A|b] = \begin{bmatrix}2 & 1 & | & 8 \\ -3 & 4 & | & -11 \end{bmatrix} \]
= np.column_stack((A, b))
augmented print("Augmented matrix:\n", augmented)
Augmented matrix:
[[ 2 1 8]
[ -3 4 -11]]
This format is useful for elimination algorithms.
- Solving directly with NumPy
= np.linalg.solve(A, b)
solution print("Solution (x,y):", solution)
Solution (x,y): [3.90909091 0.18181818]
Here NumPy solves the system using efficient algorithms.
- Checking the solution
Always verify:
= A.dot(solution)
check print("A·x =", check, "should equal b =", b)
A·x = [ 8. -11.] should equal b = [ 8 -11]
- Another example (3 variables)
\[ \begin{cases} x + y + z = 6 \\ 2x - y + z = 3 \\ - x + 2y - z = 2 \end{cases} \]
= np.array([
A 1, 1, 1],
[2, -1, 1],
[-1, 2, -1]
[
])
= np.array([6, 3, 2])
b
print("Augmented matrix:\n", np.column_stack((A, b)))
print("Solution:", np.linalg.solve(A, b))
Augmented matrix:
[[ 1 1 1 6]
[ 2 -1 1 3]
[-1 2 -1 2]]
Solution: [2.33333333 2.66666667 1. ]
Try It Yourself
Encode the system:
\[ \begin{cases} 2x - y = 1 \\ x + 3y = 7 \end{cases} \]
Write
A
andb
, then solve.For a 3×3 system, try creating a random coefficient matrix with
np.random.randint(-5,5,(3,3))
and a randomb
. Usenp.linalg.solve
.Modify the constants
b
slightly and see how the solution changes. This introduces the idea of sensitivity.
The Takeaway
- Systems of linear equations can be neatly written as \(A·x = b\).
- The augmented matrix \([A|b]\) is a compact way to set up elimination.
- This matrix encoding transforms algebra problems into matrix problems - the gateway to all of linear algebra.
22. Row Operations (Legal Moves That Keep Solutions)
When solving linear systems, we don’t want to change the solutions - just simplify the system into an easier form. This is where row operations come in. They are the “legal moves” we can do on an augmented matrix \([A|b]\) without changing the solution set.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
Three legal row operations
Swap two rows \((R_i \leftrightarrow R_j)\)
Multiply a row by a nonzero scalar \((R_i \to c·R_i)\)
Replace a row with itself plus a multiple of another row \((R_i \to R_i + c·R_j)\)
These preserve the solution set.
- Start with an augmented matrix
System:
\[ \begin{cases} x + 2y = 5 \\ 3x + 4y = 6 \end{cases} \]
= np.array([
A 1, 2, 5],
[3, 4, 6]
[=float)
], dtype
print("Initial augmented matrix:\n", A)
Initial augmented matrix:
[[1. 2. 5.]
[3. 4. 6.]]
- Row swap
Swap row 0 and row 1.
0,1]] = A[[1,0]]
A[[print("After swapping rows:\n", A)
After swapping rows:
[[3. 4. 6.]
[1. 2. 5.]]
- Multiply a row by a scalar
Make the pivot in row 0 equal to 1.
0] = A[0] / A[0,0]
A[print("After scaling first row:\n", A)
After scaling first row:
[[1. 1.33333333 2. ]
[1. 2. 5. ]]
- Add a multiple of another row
Eliminate the first column of row 1.
1] = A[1] - 3*A[0]
A[print("After eliminating x from second row:\n", A)
After eliminating x from second row:
[[ 1. 1.33333333 2. ]
[-2. -2. -1. ]]
Now the system is simpler: second row has only y
.
- Solving from the new system
= A[1,2] / A[1,1]
y = (A[0,2] - A[0,1]*y) / A[0,0]
x print("Solution: x =", x, ", y =", y)
Solution: x = 1.3333333333333335 , y = 0.5
- Using NumPy step-by-step vs solver
= np.array([[1,2],[3,4]])
coeff = np.array([5,6])
const print("np.linalg.solve result:", np.linalg.solve(coeff,const))
np.linalg.solve result: [-4. 4.5]
Both methods give the same solution.
Try It Yourself
Take the system:
\[ \begin{cases} 2x + y = 7 \\ x - y = 1 \end{cases} \]
Write its augmented matrix, then:
- Swap rows.
- Scale the first row.
- Eliminate one variable.
Create a random 3×3 system with integers between -5 and 5. Perform at least one of each row operation manually in code.
Experiment with multiplying a row by
0
. What happens, and why is this not allowed as a legal operation?
The Takeaway
- The three legal row operations are row swap, row scaling, and row replacement.
- These steps preserve the solution set while moving toward a simpler form.
- They are the foundation of Gaussian elimination, the standard algorithm for solving linear systems.
23. Row-Echelon and Reduced Row-Echelon Forms (Target Shapes)
When solving systems, our goal is to simplify the augmented matrix into a standard shape where the solutions are easy to read. These shapes are called row-echelon form (REF) and reduced row-echelon form (RREF).
Set Up Your Lab
import numpy as np
from sympy import Matrix
We’ll use NumPy for basic work and SymPy for exact RREF (since NumPy doesn’t have it built-in).
Step-by-Step Code Walkthrough
- Row-Echelon Form (REF)
- All nonzero rows are above any zero rows.
- Each leading entry (pivot) is to the right of the pivot in the row above.
- Pivots are usually scaled to 1, but not strictly required.
Example system:
\[ \begin{cases} x + 2y + z = 7 \\ 2x + 4y + z = 12 \\ 3x + 6y + 2z = 17 \end{cases} \]
= np.array([
A 1, 2, 1, 7],
[2, 4, 1, 12],
[3, 6, 2, 17]
[=float)
], dtype
print("Augmented matrix:\n", A)
Augmented matrix:
[[ 1. 2. 1. 7.]
[ 2. 4. 1. 12.]
[ 3. 6. 2. 17.]]
Perform elimination manually:
# eliminate first column entries below pivot
1] = A[1] - 2*A[0]
A[2] = A[2] - 3*A[0]
A[print("After eliminating first column:\n", A)
After eliminating first column:
[[ 1. 2. 1. 7.]
[ 0. 0. -1. -2.]
[ 0. 0. -1. -4.]]
Now the pivots move diagonally across the matrix - this is row-echelon form.
- Reduced Row-Echelon Form (RREF) In RREF, we go further:
- Every pivot = 1.
- Every pivot is the only nonzero in its column.
Instead of coding manually, we’ll let SymPy handle it:
= Matrix([
M 1, 2, 1, 7],
[2, 4, 1, 12],
[3, 6, 2, 17]
[
])
= M.rref()
M_rref print("RREF form:\n", M_rref[0])
RREF form:
Matrix([[1, 2, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
SymPy shows the final canonical form.
- Reading solutions from RREF
If the RREF looks like:
\[ \begin{bmatrix} 1 & 0 & a & b \\ 0 & 1 & c & d \\ 0 & 0 & 0 & 0 \end{bmatrix} \]
It means:
- The first two variables are leading (pivots).
- The third variable is free.
- Solutions can be written in terms of the free variable.
- A quick example with free variables
System:
\[ x + y + z = 3 \\ 2x + y - z = 0 \]
= Matrix([
M2 1,1,1,3],
[2,1,-1,0]
[
])
= M2.rref()
M2_rref print("RREF form:\n", M2_rref[0])
RREF form:
Matrix([[1, 0, -2, -3], [0, 1, 3, 6]])
Here, one column will not have a pivot → that variable is free.
Try It Yourself
Take the system:
\[ 2x + 3y = 6, \quad 4x + 6y = 12 \]
Write the augmented matrix and compute its RREF. What does it tell you about solutions?
Create a random 3×4 matrix in NumPy. Use SymPy’s
Matrix.rref()
to compute its reduced form. Identify the pivot columns.For the system:
\[ x + 2y + 3z = 4, \quad 2x + 4y + 6z = 8 \]
Check if the equations are independent or multiples of each other by looking at the RREF.
The Takeaway
- REF organizes equations into a staircase shape.
- RREF goes further, making each pivot the only nonzero in its column.
- These canonical forms make it easy to identify pivot variables, free variables, and the solution set structure.
24. Pivots, Free Variables, and Leading Ones (Reading Solutions)
Once a matrix is in row-echelon or reduced row-echelon form, the solutions to the system become visible. The key is identifying pivots, leading ones, and free variables.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- What are pivots?
- A pivot is the first nonzero entry in a row (after elimination).
- In RREF, pivots are scaled to
1
and are called leading ones. - Pivot columns correspond to basic variables.
- Example system
\[ \begin{cases} x + y + z = 6 \\ 2x + 3y + z = 10 \end{cases} \]
= Matrix([
M 1,1,1,6],
[2,3,1,10]
[
])
= M.rref()
M_rref print("RREF form:\n", M_rref[0])
RREF form:
Matrix([[1, 0, 2, 8], [0, 1, -1, -2]])
- Interpreting the RREF
Suppose the RREF comes out as:
\[ \begin{bmatrix} 1 & 0 & -2 & 4 \\ 0 & 1 & 1 & 2 \end{bmatrix} \]
This means:
Pivot columns: 1 and 2 → variables \(x\) and \(y\) are basic.
Free variable: \(z\).
Equations:
\[ x - 2z = 4, \quad y + z = 2 \]
Solution in terms of \(z\):
\[ x = 4 + 2z, \quad y = 2 - z, \quad z = z \]
- Coding the solution extraction
= M_rref
rref_matrix, pivots print("Pivot columns:", pivots)
# free variables are the columns not in pivots
= set(range(rref_matrix.shape[1]-1)) # exclude last column (constants)
all_vars = all_vars - set(pivots)
free_vars print("Free variable indices:", free_vars)
Pivot columns: (0, 1)
Free variable indices: {2}
- Another example with infinitely many solutions
\[ x + 2y + 3z = 4, \quad 2x + 4y + 6z = 8 \]
= Matrix([
M2 1,2,3,4],
[2,4,6,8]
[
])
= M2.rref()
M2_rref print("RREF form:\n", M2_rref[0])
RREF form:
Matrix([[1, 2, 3, 4], [0, 0, 0, 0]])
The second row becomes all zeros, showing redundancy. Pivot in column 1, free variables in columns 2 and 3.
- Solving underdetermined systems
If you have more variables than equations, expect free variables. Example:
\[ x + y = 3 \]
= Matrix([[1,1,3]])
M3 print("RREF form:\n", M3.rref()[0])
RREF form:
Matrix([[1, 1, 3]])
Here, \(x = 3 - y\). Variable \(y\) is free.
Try It Yourself
Take the system:
\[ x + y + z = 2, \quad y + z = 1 \]
Compute its RREF and identify pivot and free variables.
Create a random 3×4 system and compute its pivots. How many free variables do you get?
For the system:
\[ x - y = 0, \quad 2x - 2y = 0 \]
Verify that the system has infinitely many solutions and describe them in terms of a free variable.
The Takeaway
- Pivots / leading ones mark the basic variables.
- Free variables correspond to non-pivot columns.
- Solutions are written in terms of free variables, showing whether the system has a unique, infinite, or no solution.
25. Solving Consistent Systems (Unique vs. Infinite Solutions)
Now that we can spot pivots and free variables, we can classify systems of equations as having a unique solution or infinitely many solutions (assuming they’re consistent). In this lab, we’ll practice solving both types.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Unique solution example
System:
\[ x + y = 3, \quad 2x - y = 0 \]
from sympy import Matrix
= Matrix([
M 1, 1, 3],
[2, -1, 0]
[
])
= M.rref()
M_rref print("RREF form:\n", M_rref[0])
# Split into coefficient matrix A and right-hand side b
= M[:, :2]
A = M[:, 2]
b
= A.solve_least_squares(b)
solution print("Solution:", solution)
RREF form:
Matrix([[1, 0, 1], [0, 1, 2]])
Solution: Matrix([[1], [2]])
- Infinite solution example
System:
\[ x + y + z = 2, \quad 2x + 2y + 2z = 4 \]
= Matrix([
M2 1, 1, 1, 2],
[2, 2, 2, 4]
[
])
= M2.rref()
M2_rref print("RREF form:\n", M2_rref[0])
RREF form:
Matrix([[1, 1, 1, 2], [0, 0, 0, 0]])
Only one pivot → two free variables.
Interpretation:
- \(x = 2 - y - z\)
- \(y, z\) are free
- Infinite solutions described by parameters.
- Classifying consistency
A system is consistent if the RREF does not have a row like:
\[ [0, 0, 0, c] \quad (c \neq 0) \]
Example consistent system:
= Matrix([
M3 1, 2, 3],
[0, 1, 4]
[
])print("RREF:\n", M3.rref()[0])
RREF:
Matrix([[1, 0, -5], [0, 1, 4]])
Example inconsistent system (no solution):
= Matrix([
M4 1, 1, 2],
[2, 2, 5]
[
])print("RREF:\n", M4.rref()[0])
RREF:
Matrix([[1, 1, 0], [0, 0, 1]])
The second one ends with [0,0,1]
, meaning contradiction (0 = 1).
- Quick NumPy comparison
For systems with unique solutions:
= np.array([[1,1],[2,-1]], dtype=float)
A = np.array([3,0], dtype=float)
b print("Unique solution with np.linalg.solve:", np.linalg.solve(A,b))
Unique solution with np.linalg.solve: [1. 2.]
For systems with infinite solutions, np.linalg.solve
will fail, but SymPy handles parametric solutions.
Try It Yourself
Solve:
\[ x + y + z = 1, \quad 2x + 3y + z = 2 \]
Is the solution unique or infinite?
Check consistency of:
\[ x + 2y = 3, \quad 2x + 4y = 8 \]
Build a random 3×4 augmented matrix and compute its RREF. Identify:
- Does it have a unique solution, infinitely many, or none?
The Takeaway
- Unique solution: pivot in every variable column.
- Infinite solutions: free variables remain, system is still consistent.
- No solution: an inconsistent row appears.
Understanding pivots and free variables gives a complete picture of the solution set.
26. Detecting Inconsistency (When No Solution Exists)
Not all systems of linear equations can be solved. Some are inconsistent, meaning the equations contradict each other. In this lab, we’ll learn how to recognize inconsistency using augmented matrices and RREF.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- An inconsistent system
\[ x + y = 2, \quad 2x + 2y = 5 \]
Notice the second equation looks like a multiple of the first, but the constant doesn’t match - contradiction.
= Matrix([
M 1, 1, 2],
[2, 2, 5]
[
])
= M.rref()
M_rref print("RREF:\n", M_rref[0])
RREF:
Matrix([[1, 1, 0], [0, 0, 1]])
RREF gives:
\[ \begin{bmatrix} 1 & 1 & 2 \\ 0 & 0 & 1 \end{bmatrix} \]
The last row means \(0 = 1\), so no solution exists.
- A consistent system (for contrast)
\[ x + y = 2, \quad 2x + 2y = 4 \]
= Matrix([
M2 1, 1, 2],
[2, 2, 4]
[
])
print("RREF:\n", M2.rref()[0])
RREF:
Matrix([[1, 1, 2], [0, 0, 0]])
This reduces to one equation and a redundant row of zeros → infinitely many solutions.
- Visualizing inconsistency (2D case)
System:
\[ x + y = 2 \quad \text{and} \quad x + y = 3 \]
These are parallel lines that never meet.
import matplotlib.pyplot as plt
= np.linspace(-1, 3, 100)
x_vals = 2 - x_vals
y1 = 3 - x_vals
y2
="x+y=2")
plt.plot(x_vals, y1, label="x+y=3")
plt.plot(x_vals, y2, label
plt.legend()0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid() plt.show()
The two lines are parallel → no solution.
- Detecting inconsistency automatically
We can scan the RREF for a row of the form \([0, 0, …, c]\) with \(c \neq 0\).
def is_inconsistent(M):
= M.rref()
rref_matrix, _ for row in rref_matrix.tolist():
if all(v == 0 for v in row[:-1]) and row[-1] != 0:
return True
return False
print("System 1 inconsistent?", is_inconsistent(M))
print("System 2 inconsistent?", is_inconsistent(M2))
System 1 inconsistent? True
System 2 inconsistent? False
Try It Yourself
Test the system:
\[ x + 2y = 4, \quad 2x + 4y = 10 \]
Write the augmented matrix and check if it’s inconsistent.
Build a random 2×3 augmented matrix with integer entries. Use
is_inconsistent
to check.Plot two linear equations in 2D. Adjust constants to see when they intersect (consistent) vs when they are parallel (inconsistent).
The Takeaway
A system is inconsistent if RREF contains a row like \([0,0,…,c]\) with \(c \neq 0\).
Geometrically, this means the equations describe parallel lines (2D), parallel planes (3D), or higher-dimensional contradictions.
Recognizing inconsistency quickly saves time and avoids chasing impossible solutions.
27. Gaussian Elimination by Hand (A Disciplined Procedure)
Gaussian elimination is the systematic way to solve linear systems using row operations. It transforms the augmented matrix into row-echelon form (REF) and then uses back substitution to find solutions. In this lab, we’ll walk step by step through the process.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Example system
\[ \begin{cases} x + y + z = 6 \\ 2x + 3y + z = 14 \\ x + 2y + 3z = 14 \end{cases} \]
= np.array([
A 1, 1, 1, 6],
[2, 3, 1, 14],
[1, 2, 3, 14]
[=float)
], dtype
print("Initial augmented matrix:\n", A)
Initial augmented matrix:
[[ 1. 1. 1. 6.]
[ 2. 3. 1. 14.]
[ 1. 2. 3. 14.]]
- Step 1: Get a pivot in the first column
Make the pivot at (0,0) into 1 (it already is). Now eliminate below it.
1] = A[1] - 2*A[0] # Row2 → Row2 - 2*Row1
A[2] = A[2] - A[0] # Row3 → Row3 - Row1
A[print("After eliminating first column:\n", A)
After eliminating first column:
[[ 1. 1. 1. 6.]
[ 0. 1. -1. 2.]
[ 0. 1. 2. 8.]]
- Step 2: Pivot in the second column
Make the pivot in row 1, col 1 equal to 1.
1] = A[1] / A[1,1]
A[print("After scaling second row:\n", A)
After scaling second row:
[[ 1. 1. 1. 6.]
[ 0. 1. -1. 2.]
[ 0. 1. 2. 8.]]
Now eliminate below:
2] = A[2] - A[2,1]*A[1]
A[print("After eliminating second column:\n", A)
After eliminating second column:
[[ 1. 1. 1. 6.]
[ 0. 1. -1. 2.]
[ 0. 0. 3. 6.]]
- Step 3: Pivot in the third column
Make the bottom-right entry into 1.
2] = A[2] / A[2,2]
A[print("After scaling third row:\n", A)
After scaling third row:
[[ 1. 1. 1. 6.]
[ 0. 1. -1. 2.]
[ 0. 0. 1. 2.]]
At this point, the matrix is in row-echelon form (REF).
- Back substitution
Now solve from the bottom up:
= A[2,3]
z = A[1,3] - A[1,2]*z
y = A[0,3] - A[0,1]*y - A[0,2]*z
x
print(f"Solution: x={x}, y={y}, z={z}")
Solution: x=0.0, y=4.0, z=2.0
- Verification
= np.array([
coeff 1,1,1],
[2,3,1],
[1,2,3]
[=float)
], dtype= np.array([6,14,14], dtype=float)
const
print("Check with np.linalg.solve:", np.linalg.solve(coeff,const))
Check with np.linalg.solve: [0. 4. 2.]
The results match.
Try It Yourself
Solve:
\[ 2x + y = 5, \quad 4x - 6y = -2 \]
using Gaussian elimination manually in code.
Create a random 3×4 augmented matrix and reduce it step by step, printing after each row operation.
Compare your manual elimination to SymPy’s RREF with
Matrix.rref()
.
The Takeaway
- Gaussian elimination is a disciplined sequence of row operations.
- It reduces the matrix to row-echelon form, from which back substitution is straightforward.
- This method is the backbone of solving systems by hand and underlies many numerical algorithms.
28. Back Substitution and Solution Sets (Finishing Cleanly)
Once Gaussian elimination reduces a system to row-echelon form (REF), the final step is back substitution. This means solving variables starting from the last equation and working upward. In this lab, we’ll practice both unique and infinite solution cases.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Unique solution example
System:
\[ x + y + z = 6, \quad 2y + 5z = -4, \quad z = 3 \]
Row-echelon form looks like:
\[ \begin{bmatrix} 1 & 1 & 1 & 6 \\ 0 & 2 & 5 & -4 \\ 0 & 0 & 1 & 3 \end{bmatrix} \]
Solve bottom-up:
= 3
z = (-4 - 5*z)/2
y = 6 - y - z
x print(f"Solution: x={x}, y={y}, z={z}")
Solution: x=12.5, y=-9.5, z=3
- Infinite solution example
System:
\[ x + y + z = 2, \quad 2x + 2y + 2z = 4 \]
After elimination:
\[ \begin{bmatrix} 1 & 1 & 1 & 2 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]
This means:
- Equation: \(x + y + z = 2\).
- Free variables: choose \(y\) and \(z\).
Let \(y = s, z = t\). Then:
\[ x = 2 - s - t \]
So the solution set is:
from sympy import symbols
= symbols('s t')
s, t = 2 - s - t
x = s
y = t
z print("General solution:")
print("x =", x, ", y =", y, ", z =", z)
General solution:
x = -s - t + 2 , y = s , z = t
- Consistency check with RREF
We can use SymPy to confirm solution sets:
= Matrix([
M 1,1,1,2],
[2,2,2,4]
[
])
print("RREF form:\n", M.rref()[0])
RREF form:
Matrix([[1, 1, 1, 2], [0, 0, 0, 0]])
The second row disappears, showing infinite solutions.
- Encoding solution sets
General solutions are often written in parametric vector form.
For the infinite solution above:
\[ (x,y,z) = (2,0,0) + s(-1,1,0) + t(-1,0,1) \]
This shows the solution space is a plane in \(\mathbb{R}^3\).
Try It Yourself
Solve:
\[ x + 2y = 5, \quad y = 1 \]
Do back substitution by hand and check with NumPy.
Take the system:
\[ x + y + z = 1, \quad 2x + 2y + 2z = 2 \]
Write its solution set in parametric form.
Use
Matrix.rref()
on a 3×4 random augmented matrix. Identify pivot and free variables, then describe the solution set.
The Takeaway
- Back substitution is the cleanup step after Gaussian elimination.
- It reveals whether the system has a unique solution or infinitely many.
- Solutions can be expressed explicitly (unique case) or parametrically (infinite case).
29. Rank and Its First Meaning (Pivots as Information)
The rank of a matrix tells us how much independent information it contains. Rank is one of the most important concepts in linear algebra because it connects to pivots, independence, dimension, and the number of solutions to a system.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Rank definition The rank is the number of pivots (leading ones) in the row-echelon form of a matrix.
Example:
= Matrix([
A 1, 2, 3],
[2, 4, 6],
[1, 1, 1]
[
])
print("RREF:\n", A.rref()[0])
print("Rank of A:", A.rank())
RREF:
Matrix([[1, 0, -1], [0, 1, 2], [0, 0, 0]])
Rank of A: 2
- The second row is a multiple of the first, so the rank is less than 3.
- Only two independent rows → rank = 2.
- Rank and solutions to \(A·x = b\)
Consider:
\[ \begin{cases} x + y + z = 3 \\ 2x + 2y + 2z = 6 \\ x - y = 0 \end{cases} \]
= Matrix([
M 1, 1, 1, 3],
[2, 2, 2, 6],
[1, -1, 0, 0]
[
])
print("RREF:\n", M.rref()[0])
print("Rank of coefficient matrix:", M[:, :-1].rank())
print("Rank of augmented matrix:", M.rank())
RREF:
Matrix([[1, 0, 1/2, 3/2], [0, 1, 1/2, 3/2], [0, 0, 0, 0]])
Rank of coefficient matrix: 2
Rank of augmented matrix: 2
- If rank(A) = rank([A|b]) = number of variables → unique solution.
- If rank(A) = rank([A|b]) < number of variables → infinite solutions.
- If rank(A) < rank([A|b]) → no solution.
- NumPy comparison
= np.array([
A 1, 2, 3],
[2, 4, 6],
[1, 1, 1]
[=float)
], dtype
print("Rank with NumPy:", np.linalg.matrix_rank(A))
Rank with NumPy: 2
- Rank as “dimension of information”
The rank equals:
- The number of independent rows.
- The number of independent columns.
- The dimension of the column space.
= Matrix([
B 1,2],
[2,4],
[3,6]
[
])
print("Rank of B:", B.rank())
Rank of B: 1
All columns are multiples → only one independent direction → rank = 1.
Try It Yourself
Compute the rank of:
\[ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 6 & 9 \end{bmatrix} \]
What do you expect?
Create a random 4×4 matrix with
np.random.randint
. Compute its rank with both SymPy and NumPy.Test solution consistency using rank: build a system where rank(A) ≠ rank([A|b]) and show it has no solution.
The Takeaway
- Rank = number of pivots = dimension of independent information.
- Rank reveals whether a system has no solution, one solution, or infinitely many.
- Rank connects algebra (pivots) with geometry (dimension of subspaces).
30. LU Factorization (Elimination Captured as L and U)
Gaussian elimination can be recorded in a neat factorization:
\[ A = LU \]
where \(L\) is a lower triangular matrix (recording the multipliers we used) and \(U\) is an upper triangular matrix (the result of elimination). This is called LU factorization. It’s a powerful tool for solving systems efficiently.
Set Up Your Lab
import numpy as np
from scipy.linalg import lu
Step-by-Step Code Walkthrough
- Example matrix
= np.array([
A 2, 3, 1],
[4, 7, 7],
[6, 18, 22]
[=float)
], dtype
print("Matrix A:\n", A)
Matrix A:
[[ 2. 3. 1.]
[ 4. 7. 7.]
[ 6. 18. 22.]]
- LU decomposition with SciPy
= lu(A)
P, L, U
print("Permutation matrix P:\n", P)
print("Lower triangular L:\n", L)
print("Upper triangular U:\n", U)
Permutation matrix P:
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
Lower triangular L:
[[1. 0. 0. ]
[0.66666667 1. 0. ]
[0.33333333 0.6 1. ]]
Upper triangular U:
[[ 6. 18. 22. ]
[ 0. -5. -7.66666667]
[ 0. 0. -1.73333333]]
Here, \(P\) handles row swaps (partial pivoting), \(L\) is lower triangular, and \(U\) is upper triangular.
- Verifying the factorization
= P @ L @ U
reconstructed print("Does P·L·U equal A?\n", np.allclose(reconstructed, A))
Does P·L·U equal A?
True
- Solving a system with LU
Suppose we want to solve \(Ax = b\). Instead of working directly with \(A\), we solve in two steps:
- Solve \(Ly = Pb\) (forward substitution).
- Solve \(Ux = y\) (back substitution).
= np.array([1, 2, 3], dtype=float)
b
# Step 1: Pb
= P @ b
Pb
# Step 2: forward substitution Ly = Pb
= np.linalg.solve(L, Pb)
y
# Step 3: back substitution Ux = y
= np.linalg.solve(U, y)
x
print("Solution x:", x)
Solution x: [ 0.5 -0. -0. ]
- Efficiency advantage
If we have to solve many systems with the same \(A\) but different \(b\), we only compute \(LU\) once, then reuse it. This saves a lot of computation.
- NumPy’s built-in rank-revealing factorization
While NumPy doesn’t have lu
directly, it works seamlessly with SciPy. For large matrices, LU decomposition is the backbone of solvers like np.linalg.solve
.
Try It Yourself
Compute LU decomposition for
\[ A = \begin{bmatrix} 1 & 2 & 0 \\ 3 & 4 & 4 \\ 5 & 6 & 3 \end{bmatrix} \]
Verify \(P·L·U = A\).
Solve \(Ax = b\) with
\[ b = [3,7,8] \]
using LU factorization.
Compare solving with LU factorization vs directly using
np.linalg.solve(A,b)
. Are the answers the same?
The Takeaway
- LU factorization captures Gaussian elimination in matrix form: \(A = P·L·U\).
- It allows fast repeated solving of systems with different right-hand sides.
- LU decomposition is a core technique in numerical linear algebra and the basis of many solvers.
Chapter 4. Vector Spaces and Subspaces
31. Axioms of Vector Spaces (What “Space” Really Means)
Vector spaces generalize what we’ve been doing with vectors and matrices. Instead of just \(\mathbb{R}^n\), a vector space is any collection of objects (vectors) where addition and scalar multiplication follow specific axioms (rules). In this lab, we’ll explore these axioms concretely with Python.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Vector space example: \(\mathbb{R}^2\)
Let’s check two rules (axioms): closure under addition and scalar multiplication.
= np.array([1, 2])
u = np.array([3, -1])
v
# Closure under addition
print("u + v =", u + v)
# Closure under scalar multiplication
= 5
k print("k * u =", k * u)
u + v = [4 1]
k * u = [ 5 10]
Both results are still in \(\mathbb{R}^2\).
- Zero vector and additive inverses
Every vector space must contain a zero vector, and every vector must have an additive inverse.
= np.array([0, 0])
zero = -u
inverse_u print("Zero vector:", zero)
print("u + (-u) =", u + inverse_u)
Zero vector: [0 0]
u + (-u) = [0 0]
- Distributive and associative properties
Check:
- \(a(u+v) = au + av\)
- \((a+b)u = au + bu\)
= 2, 3
a, b
= a * (u + v)
lhs1 = a*u + a*v
rhs1 print("a(u+v) =", lhs1, ", au+av =", rhs1)
= (a+b) * u
lhs2 = a*u + b*u
rhs2 print("(a+b)u =", lhs2, ", au+bu =", rhs2)
a(u+v) = [8 2] , au+av = [8 2]
(a+b)u = [ 5 10] , au+bu = [ 5 10]
Both equalities hold → distributive laws confirmed.
- A set that fails to be a vector space
Consider only positive numbers with normal addition and scalar multiplication.
= [1, 2, 3]
positive_numbers try:
print("Closure under negatives?", -1 * np.array(positive_numbers))
except Exception as e:
print("Error:", e)
Closure under negatives? [-1 -2 -3]
Negative results leave the set → not a vector space.
- Python helper to check axioms
We can quickly check if a set of vectors is closed under addition and scalar multiplication.
def check_closure(vectors, scalars):
for v in vectors:
for u in vectors:
if not any(np.array_equal(v+u, w) for w in vectors):
return False
for k in scalars:
if not any(np.array_equal(k*v, w) for w in vectors):
return False
return True
= [np.array([0,0]), np.array([1,0]), np.array([0,1]), np.array([1,1])]
vectors = [0,1,-1]
scalars print("Closed under addition and scalar multiplication?", check_closure(vectors, scalars))
Closed under addition and scalar multiplication? False
This small set is closed → it forms a vector space (a subspace of \(\mathbb{R}^2\)).
Try It Yourself
- Verify that \(\mathbb{R}^3\) satisfies the vector space axioms using random vectors.
- Test whether the set of all 2×2 matrices forms a vector space under normal addition and scalar multiplication.
- Find an example of a set that fails closure (e.g., integers under division).
The Takeaway
- A vector space is any set where addition and scalar multiplication satisfy 10 standard axioms.
- These rules ensure consistent algebraic behavior.
- Many objects beyond arrows in \(\mathbb{R}^n\) (like polynomials or matrices) are vector spaces too.
32. Subspaces, Column Space, and Null Space (Where Solutions Live)
A subspace is a smaller vector space sitting inside a bigger one. For matrices, two subspaces show up all the time:
- Column space: all combinations of the matrix’s columns (possible outputs of \(Ax\)).
- Null space: all vectors \(x\) such that \(Ax = 0\) (inputs that vanish).
This lab explores both in Python.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Column space basics
Take:
\[ A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix} \]
= Matrix([
A 1,2],
[2,4],
[3,6]
[
])
print("Matrix A:\n", A)
print("Column space basis:\n", A.columnspace())
print("Rank (dimension of column space):", A.rank())
Matrix A:
Matrix([[1, 2], [2, 4], [3, 6]])
Column space basis:
[Matrix([
[1],
[2],
[3]])]
Rank (dimension of column space): 1
- The second column is a multiple of the first → column space has dimension 1.
- All outputs of \(Ax\) lie on a line in \(\mathbb{R}^3\).
- Null space basics
print("Null space basis:\n", A.nullspace())
Null space basis:
[Matrix([
[-2],
[ 1]])]
The null space contains all \(x\) where \(Ax=0\). Here, the null space is 1-dimensional (vectors like \([-2,1]\)).
- A full-rank example
= Matrix([
B 1,0,0],
[0,1,0],
[0,0,1]
[
])
print("Column space basis:\n", B.columnspace())
print("Null space basis:\n", B.nullspace())
Column space basis:
[Matrix([
[1],
[0],
[0]]), Matrix([
[0],
[1],
[0]]), Matrix([
[0],
[0],
[1]])]
Null space basis:
[]
- Column space = all of \(\mathbb{R}^3\).
- Null space = only the zero vector.
- Geometry link
For \(A\) (rank 1, 2 columns):
- Column space: line in \(\mathbb{R}^3\).
- Null space: line in \(\mathbb{R}^2\).
Together they explain the system \(Ax = b\):
- If \(b\) is outside the column space, no solution exists.
- If \(b\) is inside, solutions differ by a vector in the null space.
- Quick NumPy version
NumPy doesn’t directly give null space, but we can compute it with SVD.
from numpy.linalg import svd
= np.array([[1,2],[2,4],[3,6]], dtype=float)
A = svd(A)
U, S, Vt
= 1e-10
tol = (S <= tol)
null_mask = Vt.T[:, null_mask]
null_space print("Null space (via SVD):\n", null_space)
Null space (via SVD):
[[-0.89442719]
[ 0.4472136 ]]
Try It Yourself
Find the column space and null space of
\[ \begin{bmatrix} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \]
How many dimensions does each have?
Generate a random 3×3 matrix. Compute its rank, column space, and null space.
Solve \(Ax = b\) with
\[ A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 3 & 6 \end{bmatrix}, \quad b = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \]
and describe why it has infinitely many solutions.
The Takeaway
- The column space = all possible outputs of a matrix.
- The null space = all inputs that map to zero.
- These subspaces give the complete picture of what a matrix does.
33. Span and Generating Sets (Coverage of a Space)
The span of a set of vectors is all the linear combinations you can make from them. If a set of vectors can “cover” a whole space, we call it a generating set. This lab shows how to compute and visualize spans.
Set Up Your Lab
import numpy as np
from sympy import Matrix
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Span in \(\mathbb{R}^2\)
Two vectors that aren’t multiples span the whole plane.
= np.array([1, 0])
u = np.array([0, 1])
v
= Matrix.hstack(Matrix(u), Matrix(v))
M print("Rank:", M.rank())
Rank: 2
Rank = 2 → the span of \(\{u,v\}\) is all of \(\mathbb{R}^2\).
- Dependent vectors (smaller span)
= np.array([1, 2])
u = np.array([2, 4])
v
= Matrix.hstack(Matrix(u), Matrix(v))
M print("Rank:", M.rank())
Rank: 1
Rank = 1 → these vectors only span a line.
- Visualizing a span
Let’s see what the span of two vectors looks like.
= np.array([1, 2])
u = np.array([2, 1])
v
= np.linspace(-2, 2, 11)
coeffs = []
points for a in coeffs:
for b in coeffs:
*u + b*v)
points.append(a= np.array(points)
points
0], points[:,1], s=10)
plt.scatter(points[:,0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline("Span of {u,v}")
plt.title(
plt.grid() plt.show()
You’ll see a filled grid - the entire plane, because the two vectors are independent.
- Generating set of a space
For \(\mathbb{R}^3\):
= [Matrix([1,0,0]), Matrix([0,1,0]), Matrix([0,0,1])]
basis = Matrix.hstack(*basis)
M print("Rank:", M.rank())
Rank: 3
Rank = 3 → this set spans the whole space.
- Testing if a vector is in the span
Example: Is \([3,5]\) in the span of \([1,2]\) and \([2,1]\)?
= Matrix([1,2])
u = Matrix([2,1])
v = Matrix([3,5])
target
= Matrix.hstack(u,v)
M = M.gauss_jordan_solve(target)
solution print("Coefficients (a,b):", solution)
Coefficients (a,b): (Matrix([
[7/3],
[1/3]]), Matrix(0, 1, []))
If a solution exists, the target is in the span.
Try It Yourself
- Test if \([4,6]\) is in the span of \([1,2]\).
- Visualize the span of \([1,0,0]\) and \([0,1,0]\) in \(\mathbb{R}^3\). What does it look like?
- Create a random 3×3 matrix. Use
rank()
to check if its columns span \(\mathbb{R}^3\).
The Takeaway
- Span = all linear combinations of a set of vectors.
- Independent vectors span bigger spaces; dependent ones collapse to smaller spaces.
- Generating sets are the foundation of bases and coordinate systems.
34. Linear Independence and Dependence (No Redundancy vs. Redundancy)
A set of vectors is linearly independent if none of them can be written as a combination of the others. If at least one can, the set is dependent. This distinction tells us whether a set of vectors has redundancy.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Independent vectors example
= Matrix([1, 0, 0])
v1 = Matrix([0, 1, 0])
v2 = Matrix([0, 0, 1])
v3
= Matrix.hstack(v1, v2, v3)
M print("Rank:", M.rank(), " Number of vectors:", M.shape[1])
Rank: 3 Number of vectors: 3
Rank = 3, number of vectors = 3 → all independent.
- Dependent vectors example
= Matrix([1, 2, 3])
v1 = Matrix([2, 4, 6])
v2 = Matrix([3, 6, 9])
v3
= Matrix.hstack(v1, v2, v3)
M print("Rank:", M.rank(), " Number of vectors:", M.shape[1])
Rank: 1 Number of vectors: 3
Rank = 1, number of vectors = 3 → they’re dependent (multiples of each other).
- Checking dependence automatically
A quick test: if rank < number of vectors → dependent.
def check_independence(vectors):
= Matrix.hstack(*vectors)
M return M.rank() == M.shape[1]
print("Independent?", check_independence([Matrix([1,0]), Matrix([0,1])]))
print("Independent?", check_independence([Matrix([1,2]), Matrix([2,4])]))
Independent? True
Independent? False
- Solving for dependence relation
If vectors are dependent, we can find coefficients \(c_1, c_2, …\) such that
\[ c_1 v_1 + c_2 v_2 + … + c_k v_k = 0 \]
with some \(c_i \neq 0\).
= Matrix.hstack(Matrix([1,2]), Matrix([2,4]))
M = M.nullspace()
null_space print("Dependence relation (coefficients):", null_space)
Dependence relation (coefficients): [Matrix([
[-2],
[ 1]])]
This shows the exact linear relation.
- Random example
0)
np.random.seed(= Matrix(np.random.randint(-3, 4, (3,3)))
R print("Random matrix:\n", R)
print("Rank:", R.rank())
Random matrix:
Matrix([[1, 2, -3], [0, 0, 0], [-2, 0, 2]])
Rank: 2
Depending on the rank, the columns may be independent (rank = 3) or dependent (rank < 3).
Try It Yourself
- Test if \([1,1,0], [0,1,1], [1,2,1]\) are independent.
- Generate 4 random vectors in \(\mathbb{R}^3\). Can they ever be independent? Why or why not?
- Find the dependence relation for \([2,4], [3,6]\).
The Takeaway
- Independent set: no redundancy, each vector adds a new direction.
- Dependent set: at least one vector is unnecessary (it lies in the span of others).
- Independence is the key to defining basis and dimension.
35. Basis and Coordinates (Naming Every Vector Uniquely)
A basis is a set of independent vectors that span a space. It’s like choosing a coordinate system: every vector in the space can be expressed uniquely as a combination of basis vectors. In this lab, we’ll see how to find bases and compute coordinates relative to them.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Standard basis in \(\mathbb{R}^3\)
= Matrix([1,0,0])
e1 = Matrix([0,1,0])
e2 = Matrix([0,0,1])
e3
= Matrix.hstack(e1, e2, e3)
M print("Rank:", M.rank())
Rank: 3
These three independent vectors form the standard basis of \(\mathbb{R}^3\). Any vector like \([2,5,-1]\) can be expressed as
\[ 2e_1 + 5e_2 - 1e_3 \]
- Finding a basis from dependent vectors
= Matrix([1,2,3])
v1 = Matrix([2,4,6])
v2 = Matrix([1,0,1])
v3
= Matrix.hstack(v1,v2,v3)
M print("Column space basis:", M.columnspace())
Column space basis: [Matrix([
[1],
[2],
[3]]), Matrix([
[1],
[0],
[1]])]
SymPy extracts independent columns automatically. This gives a basis for the column space.
- Coordinates relative to a basis
Suppose basis = \(\{ [1,0], [1,1] \}\). Express vector \([3,5]\) in this basis.
= Matrix.hstack(Matrix([1,0]), Matrix([1,1]))
B = Matrix([3,5])
target
= B.solve_least_squares(target)
coords print("Coordinates in basis B:", coords)
Coordinates in basis B: Matrix([[-2], [5]])
So \([3,5] = 3·[1,0] + 2·[1,1]\).
- Basis change
If we switch to a different basis, coordinates change but the vector stays the same.
= Matrix.hstack(Matrix([2,1]), Matrix([1,2]))
new_basis = new_basis.solve_least_squares(target)
coords_new print("Coordinates in new basis:", coords_new)
Coordinates in new basis: Matrix([[1/3], [7/3]])
- Random example
Generate 3 random vectors in \(\mathbb{R}^3\). Check if they form a basis.
1)
np.random.seed(= Matrix(np.random.randint(-3,4,(3,3)))
R print("Random matrix:\n", R)
print("Rank:", R.rank())
Random matrix:
Matrix([[2, 0, 1], [-3, -2, 0], [2, -3, -3]])
Rank: 3
If rank = 3 → basis for \(\mathbb{R}^3\). Otherwise, only span a subspace.
Try It Yourself
- Check if \([1,2], [3,4]\) form a basis of \(\mathbb{R}^2\).
- Express vector \([7,5]\) in that basis.
- Create 4 random vectors in \(\mathbb{R}^3\). Find a basis for their span.
The Takeaway
- A basis = minimal set of vectors that span a space.
- Every vector has a unique coordinate representation in a given basis.
- Changing bases changes the coordinates, not the vector itself.
36. Dimension (How Many Directions)
The dimension of a vector space is the number of independent directions it has. Formally, it’s the number of vectors in any basis of the space. Dimension tells us the “size” of a space in terms of degrees of freedom.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Dimension of \(\mathbb{R}^n\)
The dimension of \(\mathbb{R}^n\) is \(n\).
= 4
n = [Matrix.eye(n)[:,i] for i in range(n)]
basis print("Basis for R^4:", basis)
print("Dimension of R^4:", len(basis))
Basis for R^4: [Matrix([
[1],
[0],
[0],
[0]]), Matrix([
[0],
[1],
[0],
[0]]), Matrix([
[0],
[0],
[1],
[0]]), Matrix([
[0],
[0],
[0],
[1]])]
Dimension of R^4: 4
Each standard unit vector adds one independent direction → dimension = 4.
- Dimension via rank
The rank of a matrix equals the dimension of its column space.
= Matrix([
A 1,2,3],
[2,4,6],
[1,0,1]
[
])
print("Rank (dimension of column space):", A.rank())
Rank (dimension of column space): 2
Here, rank = 2 → the column space is a 2D plane inside \(\mathbb{R}^3\).
- Null space dimension
The null space dimension is given by:
\[ \text{dim(Null(A))} = \#\text{variables} - \text{rank(A)} \]
print("Null space basis:", A.nullspace())
print("Dimension of null space:", len(A.nullspace()))
Null space basis: [Matrix([
[-1],
[-1],
[ 1]])]
Dimension of null space: 1
This is the number of free variables in a solution.
- Dimension in practice
- A line through the origin in \(\mathbb{R}^3\) has dimension 1.
- A plane through the origin has dimension 2.
- The whole \(\mathbb{R}^3\) has dimension 3.
Example:
= Matrix([1,2,3])
v1 = Matrix([2,4,6])
v2 = Matrix.hstack(v1,v2)
span print("Dimension of span:", span.rank())
Dimension of span: 1
Result = 1 → they only generate a line.
- Random example
2)
np.random.seed(= Matrix(np.random.randint(-3,4,(4,4)))
R print("Random 4x4 matrix:\n", R)
print("Column space dimension:", R.rank())
Random 4x4 matrix:
Matrix([[-3, 2, -3, 3], [0, -1, 0, -3], [-1, -2, 0, 2], [-1, 1, 1, 1]])
Column space dimension: 4
Rank may be 4 (full space) or smaller (collapsed).
Try It Yourself
Find the dimension of the column space of
\[ \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \end{bmatrix} \]
Compute the dimension of the null space of a 3×3 singular matrix.
Generate a 5×3 random matrix and compute its column space dimension.
The Takeaway
- Dimension = number of independent directions.
- Found by counting basis vectors (or rank).
- Dimensions describe lines (1D), planes (2D), and higher subspaces inside larger spaces.
37. Rank–Nullity Theorem (Dimensions That Add Up)
The rank–nullity theorem ties together the dimension of the column space and the null space of a matrix. It says:
\[ \text{rank}(A) + \text{nullity}(A) = \text{number of columns of } A \]
This is a powerful consistency check in linear algebra.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Simple 3×3 example
= Matrix([
A 1, 2, 3],
[2, 4, 6],
[1, 0, 1]
[
])
= A.rank()
rank = len(A.nullspace())
nullity print("Rank:", rank)
print("Nullity:", nullity)
print("Rank + Nullity =", rank + nullity)
print("Number of columns =", A.shape[1])
Rank: 2
Nullity: 1
Rank + Nullity = 3
Number of columns = 3
You should see that rank + nullity = 3, the number of columns.
- Full-rank case
= Matrix([
B 1,0,0],
[0,1,0],
[0,0,1]
[
])
print("Rank:", B.rank())
print("Nullity:", len(B.nullspace()))
Rank: 3
Nullity: 0
- Rank = 3 (all independent).
- Nullity = 0 (only zero solution to \(Bx=0\)).
- Rank + Nullity = 3 columns.
- Wide matrix (more columns than rows)
= Matrix([
C 1,2,3,4],
[0,1,1,2],
[0,0,0,0]
[
])
= C.rank()
rank = len(C.nullspace())
nullity print("Rank:", rank, " Nullity:", nullity, " Columns:", C.shape[1])
Rank: 2 Nullity: 2 Columns: 4
Here, nullity > 0 because there are more variables than independent equations.
- Verifying with random matrices
3)
np.random.seed(= Matrix(np.random.randint(-3,4,(4,5)))
R print("Random 4x5 matrix:\n", R)
print("Rank + Nullity =", R.rank() + len(R.nullspace()))
print("Number of columns =", R.shape[1])
Random 4x5 matrix:
Matrix([[-1, -3, -2, 0, -3], [-3, -3, 2, 2, 0], [-1, 0, -2, -2, -1], [2, 3, -3, 1, 1]])
Rank + Nullity = 5
Number of columns = 5
Always consistent: rank + nullity = number of columns.
- Geometric interpretation
For an \(m \times n\) matrix:
- Rank(A) = dimension of outputs (column space).
- Nullity(A) = dimension of hidden directions that collapse to 0.
- Together, they use up all the “input dimensions” (n).
Try It Yourself
Compute rank and nullity of
\[ \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \end{bmatrix} \]
Check the theorem.
Create a 2×4 random integer matrix. Confirm that rank + nullity = 4.
Explain why a tall full-rank \(5 \times 3\) matrix must have nullity = 0.
The Takeaway
- Rank + Nullity = number of columns (always true).
- Rank measures independent outputs; nullity measures hidden freedom.
- This theorem connects solutions of \(Ax=0\) with the structure of \(A\).
38. Coordinates Relative to a Basis (Changing the “Ruler”)
Once we choose a basis, every vector can be described with coordinates relative to that basis. This is like changing the “ruler” we use to measure vectors. In this lab, we’ll practice computing coordinates in different bases.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Standard basis coordinates
Vector \(v = [4,5]\) in \(\mathbb{R}^2\):
= Matrix([4,5])
v = Matrix([1,0])
e1 = Matrix([0,1])
e2
= Matrix.hstack(e1,e2)
B = B.solve_least_squares(v)
coords print("Coordinates in standard basis:", coords)
Coordinates in standard basis: Matrix([[4], [5]])
Result is just \([4,5]\). Easy - the standard basis matches the components directly.
- Non-standard basis
Suppose basis = \(\{ [1,1], [1,-1] \}\). Express \(v = [4,5]\) in this basis.
= Matrix.hstack(Matrix([1,1]), Matrix([1,-1]))
B2 = B2.solve_least_squares(v)
coords2 print("Coordinates in new basis:", coords2)
Coordinates in new basis: Matrix([[9/2], [-1/2]])
Now \(v\) has different coordinates.
- Changing coordinates back
To reconstruct the vector from coordinates:
= B2 * coords2
reconstructed print("Reconstructed vector:", reconstructed)
Reconstructed vector: Matrix([[4], [5]])
It matches the original \([4,5]\).
- Random basis in \(\mathbb{R}^3\)
= Matrix.hstack(
basis 1,0,1]),
Matrix([0,1,1]),
Matrix([1,1,0])
Matrix([
)= Matrix([2,3,4])
v
= basis.solve_least_squares(v)
coords print("Coordinates of v in random basis:", coords)
Coordinates of v in random basis: Matrix([[3/2], [5/2], [1/2]])
Any independent set of 3 vectors in \(\mathbb{R}^3\) works as a basis.
- Visualization in 2D
Let’s compare coordinates in two bases.
import matplotlib.pyplot as plt
= np.array([4,5])
v = np.array([1,1])
b1 = np.array([1,-1])
b2
0,0,v[0],v[1],angles='xy',scale_units='xy',scale=1,color='blue',label='v')
plt.quiver(0,0,b1[0],b1[1],angles='xy',scale_units='xy',scale=1,color='red',label='basis1')
plt.quiver(0,0,b2[0],b2[1],angles='xy',scale_units='xy',scale=1,color='green',label='basis2')
plt.quiver(
-1,6)
plt.xlim(-6,6)
plt.ylim(
plt.legend()
plt.grid() plt.show()
Even though the basis vectors look different, they span the same space, and \(v\) can be expressed in terms of them.
Try It Yourself
- Express \([7,3]\) in the basis \(\{[2,0], [0,3]\}\).
- Pick three independent random vectors in \(\mathbb{R}^3\). Write down the coordinates of \([1,2,3]\) in that basis.
- Verify that reconstructing always gives the original vector.
The Takeaway
- A basis provides a coordinate system for vectors.
- Coordinates depend on the basis, but the underlying vector doesn’t change.
- Changing the basis is like changing the “ruler” you measure vectors with.
39. Change-of-Basis Matrices (Moving Between Coordinate Systems)
When we switch from one basis to another, we need a change-of-basis matrix. This matrix acts like a translator: it converts coordinates in one system to coordinates in another.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Two bases in \(\mathbb{R}^2\)
Let’s define:
- Basis \(B = \{ [1,0], [0,1] \}\) (standard basis).
- Basis \(C = \{ [1,1], [1,-1] \}\).
= Matrix.hstack(Matrix([1,0]), Matrix([0,1]))
B = Matrix.hstack(Matrix([1,1]), Matrix([1,-1])) C
- Change-of-basis matrix
The matrix that converts C-coordinates → standard coordinates is just \(C\).
print("C (basis matrix):\n", C)
C (basis matrix):
Matrix([[1, 1], [1, -1]])
To go the other way (standard → C), we compute the inverse of \(C\).
= C.inv()
C_inv print("C inverse:\n", C_inv)
C inverse:
Matrix([[1/2, 1/2], [1/2, -1/2]])
- Converting coordinates
Vector \(v = [4,5]\).
- In standard basis:
= Matrix([4,5])
v = v
coords_in_standard print("Coordinates in standard basis:", coords_in_standard)
Coordinates in standard basis: Matrix([[4], [5]])
- In basis \(C\):
= C_inv * v
coords_in_C print("Coordinates in C basis:", coords_in_C)
Coordinates in C basis: Matrix([[9/2], [-1/2]])
- Convert back:
= C * coords_in_C
reconstructed print("Reconstructed vector:", reconstructed)
Reconstructed vector: Matrix([[4], [5]])
The reconstruction matches the original vector.
- General formula
If \(P\) is the change-of-basis matrix from basis \(B\) to basis \(C\):
\[ [v]_C = P^{-1}[v]_B \]
\[ [v]_B = P[v]_C \]
Here, \(P\) is the matrix of new basis vectors written in terms of the old basis.
- Random 3D example
= Matrix.eye(3) # standard basis
B = Matrix.hstack(
C 1,0,1]),
Matrix([0,1,1]),
Matrix([1,1,0])
Matrix([
)
= Matrix([2,3,4])
v
= C.inv()
C_inv = C_inv * v
coords_in_C print("Coordinates in new basis C:", coords_in_C)
print("Back to standard:", C * coords_in_C)
Coordinates in new basis C: Matrix([[3/2], [5/2], [1/2]])
Back to standard: Matrix([[2], [3], [4]])
Try It Yourself
- Convert \([7,3]\) from the standard basis to the basis \(\{[2,0], [0,3]\}\).
- Pick a random invertible 3×3 matrix as a basis. Write a vector in that basis, then convert it back to the standard basis.
- Prove that converting back and forth always returns the same vector.
The Takeaway
- A change-of-basis matrix converts coordinates between bases.
- Going from new basis → old basis uses the basis matrix.
- Going from old basis → new basis requires its inverse.
- The vector itself never changes - only the description of it does.
40. Affine Subspaces (Lines and Planes Not Through the Origin)
So far, subspaces always passed through the origin. But many familiar objects - like lines offset from the origin or planes floating in space - are affine subspaces. They look like subspaces, just shifted away from zero.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
from sympy import Matrix
Step-by-Step Code Walkthrough
- Line through the origin (a subspace)
\[ L = \{ t \cdot [1,2] : t \in \mathbb{R} \} \]
= np.linspace(-3,3,20)
t = np.array([t, 2*t]).T
line_origin 0], line_origin[:,1], label="Through origin") plt.plot(line_origin[:,
- Line not through the origin (affine subspace)
\[ L' = \{ [3,1] + t \cdot [1,2] : t \in \mathbb{R} \} \]
= np.array([3,1])
point = np.array([1,2])
direction = np.array([point + k*direction for k in t])
line_shifted 0], line_shifted[:,1], label="Shifted line") plt.plot(line_shifted[:,
- Visualizing together
*point, color="red", label="Shift point")
plt.scatter(0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.legend()
plt.grid() plt.show()
One line passes through the origin, the other is parallel but shifted.
- Plane example
A plane in \(\mathbb{R}^3\):
\[ P = \{ [1,2,3] + s[1,0,0] + t[0,1,0] : s,t \in \mathbb{R} \} \]
This is an affine plane parallel to the \(xy\)-plane, but shifted.
= np.linspace(-2,2,10)
s_vals = np.linspace(-2,2,10)
t_vals
= []
points for s in s_vals:
for t in t_vals:
1,2,3] + s*np.array([1,0,0]) + t*np.array([0,1,0]))
points.append([
= np.array(points)
points
from mpl_toolkits.mplot3d import Axes3D
= plt.figure()
fig = fig.add_subplot(111, projection='3d')
ax 0], points[:,1], points[:,2])
ax.scatter(points[:,"Affine plane in R^3")
ax.set_title( plt.show()
- Algebraic difference
- A subspace must satisfy closure under addition and scalar multiplication, and must include 0.
- An affine subspace is just a subspace plus a fixed shift vector.
Try It Yourself
Define a line in \(\mathbb{R}^2\):
\[ (x,y) = (2,3) + t(1,-1) \]
Plot it and compare with the subspace spanned by \((1,-1)\).
Construct an affine plane in \(\mathbb{R}^3\) shifted by vector \((5,5,5)\).
Show algebraically that subtracting the shift point turns an affine subspace back into a regular subspace.
The Takeaway
- Subspaces go through the origin.
- Affine subspaces are shifted copies of subspaces.
- They’re essential in geometry, computer graphics, and optimization (e.g., feasible regions in linear programming).
Chapter 5. Linear Transformation and Structure
41. Linear Transformations (Preserving Lines and Sums)
A linear transformation is a function between vector spaces that preserves two key properties:
- Additivity: \(T(u+v) = T(u) + T(v)\)
- Homogeneity: \(T(cu) = cT(u)\)
In practice, every linear transformation can be represented by a matrix. This lab will help you understand and experiment with linear transformations in Python.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Simple linear transformation (scaling)
Let’s scale vectors by 2 in the x-direction and by 0.5 in the y-direction.
= np.array([
A 2, 0],
[0, 0.5]
[
])
= np.array([1, 2])
v = A @ v
Tv print("Original v:", v)
print("Transformed Tv:", Tv)
Original v: [1 2]
Transformed Tv: [2. 1.]
- Visualizing multiple vectors
= [np.array([1,1]), np.array([2,0]), np.array([-1,2])]
vectors
for v in vectors:
= A @ v
Tv 0,0,v[0],v[1],head_width=0.1,color='blue',length_includes_head=True)
plt.arrow(0,0,Tv[0],Tv[1],head_width=0.1,color='red',length_includes_head=True)
plt.arrow(
0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(-3,5)
plt.xlim(-1,5)
plt.ylim(
plt.grid()"Blue = original, Red = transformed")
plt.title( plt.show()
Blue arrows are the original vectors; red arrows are the transformed ones. Notice how the transformation stretches and compresses consistently.
- Rotation as a linear transformation
Rotating vectors by \(\theta = 90^\circ\):
= np.pi/2
theta = np.array([
R -np.sin(theta)],
[np.cos(theta),
[np.sin(theta), np.cos(theta)]
])
= np.array([1,0])
v print("Rotate [1,0] by 90°:", R @ v)
Rotate [1,0] by 90°: [6.123234e-17 1.000000e+00]
The result is \([0,1]\), a perfect rotation.
- Checking linearity
= np.array([1,2])
u = np.array([3,4])
v = 5
c
= A @ (u+v)
lhs = A@u + A@v
rhs print("Additivity holds?", np.allclose(lhs,rhs))
= A @ (c*u)
lhs = c*(A@u)
rhs print("Homogeneity holds?", np.allclose(lhs,rhs))
Additivity holds? True
Homogeneity holds? True
Both checks return True
, proving \(T\) is linear.
- Non-linear example (for contrast)
A transformation like \(T(x,y) = (x^2, y)\) is not linear.
def nonlinear(v):
return np.array([v[0]**2, v[1]])
print("T([2,3]) =", nonlinear(np.array([2,3])))
print("Check additivity:", nonlinear(np.array([1,2])+np.array([3,4])) == (nonlinear([1,2])+nonlinear([3,4])))
T([2,3]) = [4 3]
Check additivity: [False True]
This fails the additivity test, so it’s not linear.
Try It Yourself
Define a shear matrix
\[ S = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \]
Apply it to vectors and plot before/after.
Verify linearity for rotation by 45°.
Test whether \(T(x,y) = (x+y, y)\) is linear.
The Takeaway
- A linear transformation preserves vector addition and scalar multiplication.
- Every linear transformation can be represented by a matrix.
- Visualizing with arrows helps build geometric intuition: stretching, rotating, and shearing are all linear.
42. Matrix Representation of a Linear Map (Choosing a Basis)
Every linear transformation can be written as a matrix, but the exact matrix depends on the basis you choose. This lab shows how to build and interpret matrix representations.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- From transformation to matrix
Suppose \(T: \mathbb{R}^2 \to \mathbb{R}^2\) is defined by:
\[ T(x,y) = (2x + y, \; x - y) \]
To find its matrix in the standard basis, apply \(T\) to each basis vector:
= Matrix([1,0])
e1 = Matrix([0,1])
e2
def T(v):
= v
x, y return Matrix([2*x + y, x - y])
print("T(e1):", T(e1))
print("T(e2):", T(e2))
T(e1): Matrix([[2], [1]])
T(e2): Matrix([[1], [-1]])
Stacking results as columns gives the matrix:
= Matrix.hstack(T(e1), T(e2))
A print("Matrix representation in standard basis:\n", A)
Matrix representation in standard basis:
Matrix([[2, 1], [1, -1]])
- Using the matrix for computations
= Matrix([3,4])
v print("T(v) via definition:", T(v))
print("T(v) via matrix:", A*v)
T(v) via definition: Matrix([[10], [-1]])
T(v) via matrix: Matrix([[10], [-1]])
Both methods match.
- Matrix in a different basis
Now suppose we use basis
\[ B = \{ [1,1], [1,-1] \} \]
To represent \(T\) in this basis:
- Build the change-of-basis matrix \(P\).
- Compute \(A_B = P^{-1}AP\).
= Matrix.hstack(Matrix([1,1]), Matrix([1,-1]))
B = B
P = P.inv() * A * P
A_B print("Matrix representation in new basis:\n", A_B)
Matrix representation in new basis:
Matrix([[3/2, 3/2], [3/2, -1/2]])
- Interpretation
- In standard basis, \(A\) tells us how \(T\) acts on unit vectors.
- In basis \(B\), \(A_B\) shows how \(T\) looks when described using different coordinates.
- Random linear map in \(\mathbb{R}^3\)
1)
np.random.seed(= Matrix(np.random.randint(-3,4,(3,3)))
A3 print("Random transformation matrix:\n", A3)
= Matrix.hstack(Matrix([1,0,1]), Matrix([0,1,1]), Matrix([1,1,0]))
B3 = B3.inv() * A3 * B3
A3_B print("Representation in new basis:\n", A3_B)
Random transformation matrix:
Matrix([[2, 0, 1], [-3, -2, 0], [2, -3, -3]])
Representation in new basis:
Matrix([[5/2, -3/2, 3], [-7/2, -9/2, -4], [1/2, 5/2, -1]])
Try It Yourself
- Define \(T(x,y) = (x+2y, 3x+y)\). Find its matrix in the standard basis.
- Use a new basis \(\{[2,0],[0,3]\}\). Compute the representation \(A_B\).
- Verify that applying \(T\) directly to a vector matches computing via \(A_B\) and change-of-basis.
The Takeaway
- A linear transformation becomes a matrix representation once a basis is chosen.
- Columns of the matrix = images of basis vectors.
- Changing the basis changes the matrix, but the transformation itself stays the same.
43. Kernel and Image (Inputs That Vanish; Outputs We Can Reach)
Two fundamental subspaces describe any linear transformation \(T(x) = Ax\):
- Kernel (null space): all vectors \(x\) such that \(Ax = 0\).
- Image (column space): all possible outputs \(Ax\).
The kernel tells us what inputs collapse to zero, while the image tells us what outputs are achievable.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Kernel of a matrix
Consider
\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix} \]
= Matrix([
A 1,2,3],
[2,4,6]
[
])
print("Null space (kernel):", A.nullspace())
Null space (kernel): [Matrix([
[-2],
[ 1],
[ 0]]), Matrix([
[-3],
[ 0],
[ 1]])]
The null space basis shows dependencies among columns. Here, the kernel is 2-dimensional because columns are dependent.
- Image (column space)
print("Column space (image):", A.columnspace())
print("Rank (dimension of image):", A.rank())
Column space (image): [Matrix([
[1],
[2]])]
Rank (dimension of image): 1
The image is spanned by \([1,2]^T\). So all outputs of \(A\) are multiples of this vector.
- Interpretation
- Kernel vectors → directions that map to zero.
- Image vectors → directions we can actually reach in the output space.
If \(x \in \ker(A)\), then \(Ax = 0\). If \(b\) is not in the image, the system \(Ax = b\) has no solution.
- Example with full rank
= Matrix([
B 1,0,0],
[0,1,0],
[0,0,1]
[
])
print("Kernel of B:", B.nullspace())
print("Image of B:", B.columnspace())
Kernel of B: []
Image of B: [Matrix([
[1],
[0],
[0]]), Matrix([
[0],
[1],
[0]]), Matrix([
[0],
[0],
[1]])]
- Kernel = only zero vector.
- Image = all of \(\mathbb{R}^3\).
- NumPy version (image via column space)
= np.array([[1,2,3],[2,4,6]], dtype=float)
A = np.linalg.matrix_rank(A)
rank print("Rank with NumPy:", rank)
Rank with NumPy: 1
NumPy doesn’t compute null spaces directly, but we can use SVD for that if needed.
Try It Yourself
Compute kernel and image for
\[ \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \]
What do they look like?
Take a random 3×4 matrix and find its kernel and image dimensions.
Solve \(Ax = b\) for a matrix \(A\). Try two different \(b\): one inside the image, one outside. Observe the difference.
The Takeaway
- Kernel = inputs that vanish under \(A\).
- Image = outputs that can be reached by \(A\).
- Together, they fully describe what a linear map does: what it “kills” and what it “produces.”
44. Invertibility and Isomorphisms (Perfectly Reversible Maps)
A matrix (or linear map) is invertible if it has an inverse \(A^{-1}\) such that
\[ A^{-1}A = I \quad \text{and} \quad AA^{-1} = I \]
An invertible map is also called an isomorphism, because it preserves all information - every input has exactly one output, and every output comes from exactly one input.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Checking invertibility
= Matrix([
A 2,1],
[5,3]
[
])
print("Determinant:", A.det())
print("Is invertible?", A.det() != 0)
Determinant: 1
Is invertible? True
If determinant ≠ 0 → invertible.
- Computing the inverse
= A.inv()
A_inv print("Inverse matrix:\n", A_inv)
print("Check A*A_inv = I:\n", A * A_inv)
Inverse matrix:
Matrix([[3, -1], [-5, 2]])
Check A*A_inv = I:
Matrix([[1, 0], [0, 1]])
- Solving systems with inverses
For \(Ax = b\), if \(A\) is invertible:
= Matrix([1,2])
b = A_inv * b
x print("Solution x:", x)
Solution x: Matrix([[1], [-1]])
This is equivalent to A.solve(b)
in SymPy or np.linalg.solve
in NumPy.
- Non-invertible (singular) example
= Matrix([
B 1,2],
[2,4]
[
])
print("Determinant:", B.det())
print("Is invertible?", B.det() != 0)
Determinant: 0
Is invertible? False
Determinant = 0 → no inverse. The matrix collapses space onto a line, losing information.
- NumPy version
= np.array([[2,1],[5,3]], dtype=float)
A print("Determinant:", np.linalg.det(A))
print("Inverse:\n", np.linalg.inv(A))
Determinant: 1.0000000000000002
Inverse:
[[ 3. -1.]
[-5. 2.]]
- Geometric intuition
- Invertible transformation = reversible (like rotating, scaling by nonzero).
- Non-invertible transformation = squashing space into a lower dimension (like flattening a plane onto a line).
Try It Yourself
Test whether
\[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]
is invertible and find its inverse.
Compute the determinant of a 3×3 random integer matrix. If it’s nonzero, find its inverse.
Create a singular 3×3 matrix (make one row a multiple of another). Confirm it has no inverse.
The Takeaway
- Invertible matrix ↔︎ isomorphism: perfectly reversible, no information lost.
- Determinant ≠ 0 → invertible; determinant = 0 → singular.
- Inverses are useful conceptually, but in computation we usually solve systems directly instead of calculating \(A^{-1}\).
45. Composition, Powers, and Iteration (Doing It Again and Again)
Linear transformations can be chained together. Applying one after another is called composition, and in matrix form this becomes multiplication. Repeated application of the same transformation leads to powers of a matrix.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Composition of transformations
Suppose we have two linear maps:
- \(T_1\): rotate by 90°
- \(T_2\): scale x by 2
= np.pi/2
theta = np.array([
R -np.sin(theta)],
[np.cos(theta),
[np.sin(theta), np.cos(theta)]
])= np.array([
S 2,0],
[0,1]
[
])
# Compose: apply R then S
= S @ R
C print("Composite matrix:\n", C)
Composite matrix:
[[ 1.2246468e-16 -2.0000000e+00]
[ 1.0000000e+00 6.1232340e-17]]
Applying the composite matrix is equivalent to applying both maps in sequence.
- Verifying with a vector
= np.array([1,1])
v = R @ v
step1 = S @ step1
step2 = C @ v
composite
print("Step-by-step:", step2)
print("Composite:", composite)
Step-by-step: [-2. 1.]
Composite: [-2. 1.]
Both results are the same → composition = matrix multiplication.
- Powers of a matrix
Repeatedly applying a transformation corresponds to matrix powers.
Example: scaling by 2.
= np.array([[2,0],[0,2]])
A = np.array([1,1])
v
print("A @ v =", A @ v)
print("A^2 @ v =", np.linalg.matrix_power(A,2) @ v)
print("A^5 @ v =", np.linalg.matrix_power(A,5) @ v)
A @ v = [2 2]
A^2 @ v = [4 4]
A^5 @ v = [32 32]
Each step doubles the scaling effect.
- Iteration dynamics
Let’s iterate a transformation many times and see what happens.
Example:
\[ A = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.5 \end{bmatrix} \]
= np.array([[0.5,0],[0,0.5]])
A = np.array([4,4])
v
for i in range(5):
= A @ v
v print(f"Step {i+1}:", v)
Step 1: [2. 2.]
Step 2: [1. 1.]
Step 3: [0.5 0.5]
Step 4: [0.25 0.25]
Step 5: [0.125 0.125]
Each step shrinks the vector → iteration can reveal stability.
- Random example
0)
np.random.seed(= np.random.randint(-2,3,(2,2))
M print("Random matrix:\n", M)
print("M^2:\n", np.linalg.matrix_power(M,2))
print("M^3:\n", np.linalg.matrix_power(M,3))
Random matrix:
[[ 2 -2]
[ 1 1]]
M^2:
[[ 2 -6]
[ 3 -1]]
M^3:
[[ -2 -10]
[ 5 -7]]
Try It Yourself
- Create two transformations: reflection across x-axis and scaling by 3. Compose them.
- Take a shear matrix and compute \(A^5\). What happens to a vector after repeated application?
- Experiment with a rotation matrix raised to higher powers. What cycle do you see?
The Takeaway
- Composition of linear maps = matrix multiplication.
- Powers of a matrix represent repeated application.
- Iteration reveals long-term dynamics: shrinking, growing, or oscillating behavior.
46. Similarity and Conjugation (Same Action, Different Basis)
Two matrices \(A\) and \(B\) are called similar if there exists an invertible matrix \(P\) such that
\[ B = P^{-1} A P \]
This means \(A\) and \(B\) represent the same linear transformation, but in different bases. This lab explores similarity and why it matters.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Example with a change of basis
= Matrix([
A 2,1],
[0,2]
[
])
= Matrix([
P 1,1],
[0,1]
[
])
= P.inv() * A * P
B print("Original A:\n", A)
print("Similar matrix B:\n", B)
Original A:
Matrix([[2, 1], [0, 2]])
Similar matrix B:
Matrix([[2, 1], [0, 2]])
Here, \(A\) and \(B\) are similar: they describe the same transformation in different coordinates.
- Eigenvalues stay the same
Similarity preserves eigenvalues.
print("Eigenvalues of A:", A.eigenvals())
print("Eigenvalues of B:", B.eigenvals())
Eigenvalues of A: {2: 2}
Eigenvalues of B: {2: 2}
Both matrices have the same eigenvalues, even though their entries differ.
- Similarity and diagonalization
If a matrix is diagonalizable, there exists \(P\) such that
\[ D = P^{-1} A P \]
where \(D\) is diagonal.
= Matrix([
C 4,1],
[0,2]
[
])
= C.diagonalize()
P, D print("Diagonal form D:\n", D)
print("Check similarity (P^-1 C P = D):\n", P.inv()*C*P)
Diagonal form D:
Matrix([[2, 0], [0, 4]])
Check similarity (P^-1 C P = D):
Matrix([[2, 0], [0, 4]])
Diagonalization is a special case of similarity, where the new matrix is as simple as possible.
- NumPy version
= np.array([[2,1],[0,2]], dtype=float)
A = np.linalg.eig(A)
eigvals, eigvecs print("Eigenvalues:", eigvals)
print("Eigenvectors (basis P):\n", eigvecs)
Eigenvalues: [2. 2.]
Eigenvectors (basis P):
[[ 1.0000000e+00 -1.0000000e+00]
[ 0.0000000e+00 4.4408921e-16]]
Here, eigenvectors form the change-of-basis matrix \(P\).
- Geometric interpretation
- Similar matrices = same transformation, different “ruler” (basis).
- Diagonalization = finding a ruler that makes the transformation look like pure stretching along axes.
Try It Yourself
Take
\[ A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \]
and find a matrix \(P\) that gives a similar \(B\).
Show that two similar matrices have the same determinant and trace.
For a random 3×3 matrix, check if it is diagonalizable using SymPy’s
.diagonalize()
method.
The Takeaway
- Similarity = same linear map, different basis.
- Similar matrices share eigenvalues, determinant, and trace.
- Diagonalization is the simplest similarity form, making repeated computations (like powers) much easier.
47. Projections and Reflections (Idempotent and Involutive Maps)
Two very common geometric linear maps are projections and reflections. They show up in graphics, physics, and optimization.
- A projection squashes vectors onto a subspace (like dropping a shadow).
- A reflection flips vectors across a line or plane (like a mirror).
Set Up Your Lab
import numpy as np
from sympy import Matrix
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Projection onto a line
If we want to project onto the line spanned by \(u\), the projection matrix is:
\[ P = \frac{uu^T}{u^T u} \]
= np.array([2,1], dtype=float)
u = u / np.linalg.norm(u) # normalize
u = np.outer(u,u)
P
print("Projection matrix:\n", P)
Projection matrix:
[[0.8 0.4]
[0.4 0.2]]
Apply projection:
= np.array([3,4], dtype=float)
v = P @ v
proj_v print("Original v:", v)
print("Projection of v onto u:", proj_v)
Original v: [3. 4.]
Projection of v onto u: [4. 2.]
- Visualization of projection
0,0,v[0],v[1],head_width=0.1,color="blue",length_includes_head=True)
plt.arrow(0,0,proj_v[0],proj_v[1],head_width=0.1,color="red",length_includes_head=True)
plt.arrow(0],proj_v[1],v[0]-proj_v[0],v[1]-proj_v[1],head_width=0.1,color="gray",linestyle="dashed")
plt.arrow(proj_v[
0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid()"Blue = original, Red = projection, Gray = error vector")
plt.title( plt.show()
The projection is the closest point on the line to the original vector.
- Reflection across a line
The reflection matrix across the line spanned by \(u\) is:
\[ R = 2P - I \]
= np.eye(2)
I = 2*P - I
R
= R @ v
reflect_v print("Reflection of v across line u:", reflect_v)
Reflection of v across line u: [ 5.0000000e+00 -4.4408921e-16]
- Checking algebraic properties
- Projection: \(P^2 = P\) (idempotent).
- Reflection: \(R^2 = I\) (involutive).
print("P^2 =\n", P @ P)
print("R^2 =\n", R @ R)
P^2 =
[[0.8 0.4]
[0.4 0.2]]
R^2 =
[[ 1.00000000e+00 -1.59872116e-16]
[-1.59872116e-16 1.00000000e+00]]
- Projection in higher dimensions
Project onto the plane spanned by two vectors in \(\mathbb{R}^3\).
= np.array([1,0,0], dtype=float)
u1 = np.array([0,1,0], dtype=float)
u2
= np.column_stack((u1,u2)) # basis for plane
U = U @ np.linalg.inv(U.T @ U) @ U.T
P_plane
= np.array([1,2,3], dtype=float)
v = P_plane @ v
proj_plane print("Projection onto xy-plane:", proj_plane)
Projection onto xy-plane: [1. 2. 0.]
Try It Yourself
- Project \([4,5]\) onto the x-axis and verify the result.
- Reflect \([1,2]\) across the line \(y=x\).
- Create a random 3D vector and project it onto the plane spanned by \([1,1,0]\) and \([0,1,1]\).
The Takeaway
- Projection: idempotent (\(P^2 = P\)), finds the closest vector in a subspace.
- Reflection: involutive (\(R^2 = I\)), flips across a line/plane but preserves lengths.
- Both are simple but powerful examples of linear transformations with clear geometry.
48. Rotations and Shear (Geometric Intuition)
Two transformations often used in geometry, graphics, and physics are rotations and shears. Both are linear maps, but they behave differently:
- Rotation preserves lengths and angles.
- Shear preserves area (in 2D) but distorts shapes, turning squares into parallelograms.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Rotation in 2D
The rotation matrix by angle \(\theta\) is:
\[ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \]
def rotation_matrix(theta):
return np.array([
-np.sin(theta)],
[np.cos(theta),
[np.sin(theta), np.cos(theta)]
])
= np.pi/4 # 45 degrees
theta = rotation_matrix(theta)
R
= np.array([2,1])
v = R @ v
rotated_v print("Original v:", v)
print("Rotated v (45°):", rotated_v)
Original v: [2 1]
Rotated v (45°): [0.70710678 2.12132034]
- Visualizing rotation
0,0,v[0],v[1],head_width=0.1,color="blue",length_includes_head=True)
plt.arrow(0,0,rotated_v[0],rotated_v[1],head_width=0.1,color="red",length_includes_head=True)
plt.arrow(
0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid()"Blue = original, Red = rotated (45°)")
plt.title("equal")
plt.axis( plt.show()
The vector rotates counterclockwise by 45°.
- Shear in 2D
A shear along the x-axis by factor \(k\):
\[ S = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix} \]
= 1.0
k = np.array([
S 1,k],
[0,1]
[
])
= S @ v
sheared_v print("Sheared v:", sheared_v)
Sheared v: [3. 1.]
- Visualizing shear
0,0,v[0],v[1],head_width=0.1,color="blue",length_includes_head=True)
plt.arrow(0,0,sheared_v[0],sheared_v[1],head_width=0.1,color="green",length_includes_head=True)
plt.arrow(
0,color='black',linewidth=0.5)
plt.axhline(0,color='black',linewidth=0.5)
plt.axvline(
plt.grid()"Blue = original, Green = sheared")
plt.title("equal")
plt.axis( plt.show()
The shear moves the vector sideways, distorting its angle.
- Properties check
- Rotation preserves length:
print("||v|| =", np.linalg.norm(v))
print("||R v|| =", np.linalg.norm(rotated_v))
||v|| = 2.23606797749979
||R v|| = 2.2360679774997894
- Shear preserves area (determinant = 1):
print("det(S) =", np.linalg.det(S))
det(S) = 1.0
Try It Yourself
- Rotate \([1,0]\) by 90° and check it becomes \([0,1]\).
- Apply shear with \(k=2\) to a square (points \((0,0),(1,0),(1,1),(0,1)\)) and plot before/after.
- Combine rotation and shear: apply shear first, then rotation. What happens?
The Takeaway
- Rotation: length- and angle-preserving, determinant = 1.
- Shear: shape-distorting but area-preserving, determinant = 1.
- Both are linear maps that provide geometric intuition and real-world modeling tools.
49. Rank and Operator Viewpoint (Rank Beyond Elimination)
The rank of a matrix tells us how much “information” a linear map carries. Algebraically, it is the dimension of the image (column space). Geometrically, it measures how many independent directions survive the transformation.
From the operator viewpoint:
- A matrix \(A\) is not just a table of numbers - it is a linear operator that maps vectors to other vectors.
- The rank is the dimension of the output space that \(A\) actually reaches.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Rank via elimination (SymPy)
= Matrix([
A 1,2,3],
[2,4,6],
[1,1,1]
[
])
print("Matrix A:\n", A)
print("Rank of A:", A.rank())
Matrix A:
Matrix([[1, 2, 3], [2, 4, 6], [1, 1, 1]])
Rank of A: 2
Here, the second row is a multiple of the first → less independence → rank < 3.
- Rank via NumPy
= np.array([[1,2,3],[2,4,6],[1,1,1]], dtype=float)
A_np print("Rank (NumPy):", np.linalg.matrix_rank(A_np))
Rank (NumPy): 2
- Operator viewpoint
Let’s apply \(A\) to random vectors:
for v in [np.array([1,0,0]), np.array([0,1,0]), np.array([0,0,1])]:
print("A @", v, "=", A_np @ v)
A @ [1 0 0] = [1. 2. 1.]
A @ [0 1 0] = [2. 4. 1.]
A @ [0 0 1] = [3. 6. 1.]
Even though we started in 3D, all outputs lie in a plane in \(\mathbb{R}^3\). That’s why rank = 2.
- Full rank vs reduced rank
- Full rank: the transformation preserves dimension (no collapse).
- Reduced rank: the transformation collapses onto a lower-dimensional subspace.
Example full-rank:
= Matrix([
B 1,0,0],
[0,1,0],
[0,0,1]
[
])
print("Rank of B:", B.rank())
Rank of B: 3
- Connection to nullity
The rank-nullity theorem:
\[ \text{rank}(A) + \text{nullity}(A) = \text{number of columns of } A \]
Check with SymPy:
print("Null space (basis):", A.nullspace())
print("Nullity:", len(A.nullspace()))
print("Rank + Nullity =", A.rank() + len(A.nullspace()))
Null space (basis): [Matrix([
[ 1],
[-2],
[ 1]])]
Nullity: 1
Rank + Nullity = 3
Try It Yourself
Take
\[ \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \]
and compute its rank. Why is it 1?
For a random 4×4 matrix, use
np.linalg.matrix_rank
to check if it’s invertible.Verify rank-nullity theorem for a 3×5 random integer matrix.
The Takeaway
- Rank = dimension of the image (how many independent outputs a transformation has).
- Operator viewpoint: rank shows how much of the input space survives after transformation.
- Rank-nullity links the image and kernel - together they fully describe a linear operator.
50. Block Matrices and Block Maps (Divide and Conquer Structure)
Sometimes matrices can be arranged in blocks (submatrices). Treating a big matrix as smaller pieces helps simplify calculations, especially in systems with structure (networks, coupled equations, or partitioned variables).
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Constructing block matrices
We can build a block matrix from smaller pieces:
= Matrix([[1,2],[3,4]])
A11 = Matrix([[5,6],[7,8]])
A12 = Matrix([[9,10]])
A21 = Matrix([[11,12]])
A22
# Combine into a block matrix
= Matrix.vstack(
A
Matrix.hstack(A11, A12),
Matrix.hstack(A21, A22)
)print("Block matrix A:\n", A)
Block matrix A:
Matrix([[1, 2, 5, 6], [3, 4, 7, 8], [9, 10, 11, 12]])
- Block multiplication
If a matrix is partitioned into blocks, multiplication follows block rules:
\[ \begin{bmatrix} A & B \\ C & D \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} Ax + By \\ Cx + Dy \end{bmatrix} \]
Example:
= Matrix([
A 1,2,5,6],
[3,4,7,8],
[9,10,11,12]
[
])
= Matrix([1,1,2,2])
x print("A * x =", A*x)
A * x = Matrix([[25], [37], [65]])
Here the vector is split into blocks \([x,y]\).
- Block diagonal matrices
Block diagonal = independent subproblems:
= Matrix([[2,0],[0,2]])
B1 = Matrix([[3,1],[0,3]])
B2
= Matrix([
BlockDiag 2,0,0,0],
[0,2,0,0],
[0,0,3,1],
[0,0,0,3]
[
])
print("Block diagonal matrix:\n", BlockDiag)
Block diagonal matrix:
Matrix([[2, 0, 0, 0], [0, 2, 0, 0], [0, 0, 3, 1], [0, 0, 0, 3]])
Applying this matrix acts separately on each block - like running two smaller transformations in parallel.
- Inverse of block diagonal
The inverse of a block diagonal is just the block diagonal of inverses:
= B1.inv()
B1_inv = B2.inv()
B2_inv = Matrix([
BlockDiagInv 0,0],0,0,0],
[B1_inv[0,B1_inv[1,1],0,0],
[0,0,B2_inv[0,0],B2_inv[0,1]],
[0,0,B2_inv[1,0],B2_inv[1,1]]
[
])print("Inverse block diag:\n", BlockDiagInv)
Inverse block diag:
Matrix([[1/2, 0, 0, 0], [0, 1/2, 0, 0], [0, 0, 1/3, -1/9], [0, 0, 0, 1/3]])
- Practical example - coupled equations
Suppose we have two independent systems:
- System 1: \(Ax = b\)
- System 2: \(Cy = d\)
We can represent both together:
\[ \begin{bmatrix} A & 0 \\ 0 & C \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} b \\ d \end{bmatrix} \]
This shows how block matrices organize multiple systems in one big equation.
Try It Yourself
- Build a block diagonal matrix with three 2×2 blocks. Apply it to a vector.
- Verify block multiplication rule by manually computing \(Ax + By\) and \(Cx + Dy\).
- Write two small systems of equations and combine them into one block system.
The Takeaway
- Block matrices let us break down big systems into smaller parts.
- Block diagonal matrices = independent subsystems.
- Thinking in blocks simplifies algebra, programming, and numerical computation.
Chapter 6. Determinants and volume
51. Areas, Volumes, and Signed Scale Factors (Geometric Entry Point)
The determinant of a matrix has a deep geometric meaning: it tells us how a linear transformation scales area (in 2D), volume (in 3D), or higher-dimensional content. It can also flip orientation (sign).
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Determinant in 2D (area scaling)
Let’s take a matrix that stretches and shears:
= Matrix([
A 2,1],
[1,1]
[
])
print("Determinant:", A.det())
Determinant: 1
The determinant = 1 → areas are preserved, even though the shape is distorted.
- Unit square under transformation
Transform the square with corners \((0,0),(1,0),(1,1),(0,1)\):
= Matrix([
square 0,0],
[1,0],
[1,1],
[0,1]
[
])
= (A * square.T).T
transformed print("Original square:\n", square)
print("Transformed square:\n", transformed)
Original square:
Matrix([[0, 0], [1, 0], [1, 1], [0, 1]])
Transformed square:
Matrix([[0, 0], [2, 1], [3, 2], [1, 1]])
The area of the transformed shape equals \(|\det(A)|\).
- Determinant in 3D (volume scaling)
= Matrix([
B 1,2,0],
[0,1,0],
[0,0,3]
[
])
print("Determinant:", B.det())
Determinant: 3
\(\det(B)=3\) means that volumes are scaled by 3.
- Negative determinant = orientation flip
= Matrix([
C 0,1],
[1,0]
[
])
print("Determinant:", C.det())
Determinant: -1
The determinant = -1 → area preserved but orientation flipped (like a mirror reflection).
- NumPy version
= np.array([[2,1],[1,1]], dtype=float)
A print("Det (NumPy):", np.linalg.det(A))
Det (NumPy): 1.0
Try It Yourself
Take
\[ \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \]
and compute the determinant. Verify it scales areas by 6.
Build a 3×3 shear matrix and check how it affects volume.
Test a reflection matrix and confirm that the determinant is negative.
The Takeaway
- Determinant measures how a linear map scales area, volume, or hypervolume.
- Positive determinant = preserves orientation; negative = flips it.
- Magnitude of determinant = scaling factor of geometric content.
52. Determinant via Linear Rules (Multilinearity, Sign, Normalization)
The determinant isn’t just a formula; it’s defined by three elegant rules that make it unique. These rules capture its geometric meaning as a volume-scaling factor.
- Multilinearity: Linear in each row (or column).
- Sign Change: Swapping two rows flips the sign.
- Normalization: The determinant of the identity matrix is 1.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Multilinearity
If one row is scaled, the determinant scales the same way.
= Matrix([[1,2],[3,4]])
A print("det(A):", A.det())
= Matrix([[2,4],[3,4]]) # first row doubled
B print("det(B):", B.det())
det(A): -2
det(B): -4
You’ll see det(B) = 2 * det(A)
.
- Sign change by row swap
= Matrix([[1,2],[3,4]])
C = Matrix([[3,4],[1,2]])
C_swapped
print("det(C):", C.det())
print("det(C_swapped):", C_swapped.det())
det(C): -2
det(C_swapped): 2
Swapping rows flips the sign of the determinant.
- Normalization rule
= Matrix.eye(3)
I print("det(I):", I.det())
det(I): 1
The determinant of the identity is always 1 - this fixes the scaling baseline.
- Combining rules (example in 3×3)
= Matrix([[1,2,3],[4,5,6],[7,8,9]])
M print("det(M):", M.det())
det(M): 0
Here, rows are linearly dependent, so the determinant is 0 - consistent with multilinearity (since one row can be written as a combo of others).
- NumPy check
= np.array([[1,2],[3,4]], dtype=float)
A print("det(A) NumPy:", np.linalg.det(A))
det(A) NumPy: -2.0000000000000004
Both SymPy and NumPy confirm the same result.
Try It Yourself
- Scale a row of a 3×3 matrix by 3. Confirm the determinant scales by 3.
- Swap two rows twice in a row - does the determinant return to its original value?
- Compute determinant of a triangular matrix. What pattern do you see?
The Takeaway
- Determinant is defined by multilinearity, sign change, and normalization.
- These rules uniquely pin down the determinant’s behavior.
- Every formula (cofactor expansion, row-reduction method, etc.) comes from these core principles.
53. Determinant and Row Operations (How Each Move Changes det)
Row operations are at the heart of Gaussian elimination, and the determinant has simple, predictable reactions to them. Understanding these reactions gives both computational shortcuts and geometric intuition.
The Three Key Rules
- Row swap: Swapping two rows flips the sign of the determinant.
- Row scaling: Multiplying a row by a scalar \(c\) multiplies the determinant by \(c\).
- Row replacement: Adding a multiple of one row to another leaves the determinant unchanged.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Row swap
= Matrix([[1,2],[3,4]])
A print("det(A):", A.det())
= Matrix([[3,4],[1,2]])
A_swapped print("det(after swap):", A_swapped.det())
det(A): -2
det(after swap): 2
The result flips sign.
- Row scaling
= Matrix([[1,2],[3,4]])
B = Matrix([[2,4],[3,4]]) # first row × 2
B_scaled
print("det(B):", B.det())
print("det(after scaling row 1 by 2):", B_scaled.det())
det(B): -2
det(after scaling row 1 by 2): -4
Determinant is multiplied by 2.
- Row replacement (no change)
= Matrix([[1,2],[3,4]])
C = Matrix([[1,2],[3-2*1, 4-2*2]]) # row2 → row2 - 2*row1
C_replaced
print("det(C):", C.det())
print("det(after row replacement):", C_replaced.det())
det(C): -2
det(after row replacement): -2
Determinant stays the same.
- Triangular form shortcut
Since elimination only uses row replacement (which doesn’t change the determinant) and row swaps/scales (which we can track), the determinant of a triangular matrix is just the product of its diagonal entries.
= Matrix([[2,1,3],[0,4,5],[0,0,6]])
D print("det(D):", D.det())
print("Product of diagonals:", 2*4*6)
det(D): 48
Product of diagonals: 48
- NumPy confirmation
= np.array([[1,2,3],[0,4,5],[1,0,6]], dtype=float)
A print("det(A):", np.linalg.det(A))
det(A): 22.000000000000004
Try It Yourself
Take
\[ \begin{bmatrix} 2 & 3 \\ 4 & 6 \end{bmatrix} \]
and scale the second row by \(\tfrac{1}{2}\). Compare determinants before and after.
Do Gaussian elimination on a 3×3 matrix, and track how each row operation changes the determinant.
Compute determinant by reducing to triangular form and compare with SymPy’s
.det()
.
The Takeaway
- Determinant reacts predictably to row operations.
- Row replacement is “safe” (no change), scaling multiplies by the factor, and swapping flips the sign.
- This makes elimination not just a solving tool, but also a method to compute determinants efficiently.
54. Triangular Matrices and Product of Diagonals (Fast Wins)
For triangular matrices (upper or lower), the determinant is simply the product of the diagonal entries. This rule is one of the biggest shortcuts in linear algebra - no expansion or elimination needed.
Why It Works
- Triangular matrices already look like the end result of Gaussian elimination.
- Since row replacement operations don’t change the determinant, what’s left is just the product of the diagonal.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Upper triangular example
= Matrix([
A 2,1,3],
[0,4,5],
[0,0,6]
[
])
print("det(A):", A.det())
print("Product of diagonals:", 2*4*6)
det(A): 48
Product of diagonals: 48
Both values match exactly.
- Lower triangular example
= Matrix([
B 7,0,0],
[2,5,0],
[3,4,9]
[
])
print("det(B):", B.det())
print("Product of diagonals:", 7*5*9)
det(B): 315
Product of diagonals: 315
- Diagonal matrix (special case)
For diagonal matrices, determinant = product of diagonal entries directly.
= Matrix.diag(3,5,7)
C print("det(C):", C.det())
print("Product of diagonals:", 3*5*7)
det(C): 105
Product of diagonals: 105
- NumPy version
= np.array([[2,1,3],[0,4,5],[0,0,6]], dtype=float)
A print("det(A):", np.linalg.det(A))
print("Product of diagonals:", np.prod(np.diag(A)))
det(A): 47.999999999999986
Product of diagonals: 48.0
- Quick elimination to triangular form
Even for non-triangular matrices, elimination reduces them to triangular form, where this rule applies.
= Matrix([[1,2,3],[4,5,6],[7,8,10]])
D print("det(D) via SymPy:", D.det())
print("det(D) via LU decomposition:", D.LUdecomposition()[0].det() * D.LUdecomposition()[1].det())
det(D) via SymPy: -3
det(D) via LU decomposition: -3
Try It Yourself
- Compute the determinant of a 4×4 diagonal matrix quickly.
- Verify that triangular matrices with a zero on the diagonal always have determinant 0.
- Use SymPy to check that elimination to triangular form preserves determinant (except for swaps/scales).
The Takeaway
For triangular (and diagonal) matrices:
\[ \det(A) = \prod_{i} a_{ii} \]
This shortcut makes determinant computation trivial.
Gaussian elimination leverages this fact: once reduced to triangular form, the determinant is just the product of pivots (with sign adjustments for swaps).
55. det(AB) = det(A)det(B) (Multiplicative Magic)
One of the most elegant properties of determinants is multiplicativity:
\[ \det(AB) = \det(A)\,\det(B) \]
This rule is powerful because it connects algebra (matrix multiplication) with geometry (volume scaling).
Geometric Intuition
- If \(A\) scales volumes by factor \(\det(A)\), and \(B\) scales them by \(\det(B)\), then applying \(B\) followed by \(A\) scales volumes by \(\det(A)\det(B)\).
- This property works in all dimensions.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- 2×2 example
= Matrix([[2,1],[0,3]])
A = Matrix([[1,4],[2,5]])
B
= A.det()
detA = B.det()
detB = (A*B).det()
detAB
print("det(A):", detA)
print("det(B):", detB)
print("det(AB):", detAB)
print("det(A)*det(B):", detA*detB)
det(A): 6
det(B): -3
det(AB): -18
det(A)*det(B): -18
The two results match.
- 3×3 random matrix check
1)
np.random.seed(= Matrix(np.random.randint(-3,4,(3,3)))
A = Matrix(np.random.randint(-3,4,(3,3)))
B
print("det(A):", A.det())
print("det(B):", B.det())
print("det(AB):", (A*B).det())
print("det(A)*det(B):", A.det()*B.det())
det(A): 25
det(B): -15
det(AB): -375
det(A)*det(B): -375
- Special cases
- If \(\det(A)=0\), then \(\det(AB)=0\).
- If \(\det(A)=\pm1\), it acts like a “volume-preserving” transformation (rotation/reflection).
= Matrix([[1,0],[0,0]]) # singular
A = Matrix([[2,3],[4,5]])
B
print("det(A):", A.det())
print("det(AB):", (A*B).det())
det(A): 0
det(AB): 0
Both are 0.
- NumPy version
= np.array([[2,1],[0,3]], dtype=float)
A = np.array([[1,4],[2,5]], dtype=float)
B
= np.linalg.det(A @ B)
lhs = np.linalg.det(A) * np.linalg.det(B)
rhs
print("det(AB) =", lhs)
print("det(A)*det(B) =", rhs)
det(AB) = -17.999999999999996
det(A)*det(B) = -17.999999999999996
Try It Yourself
- Construct two triangular matrices and verify multiplicativity (diagonal products multiply too).
- Test the property with an orthogonal matrix \(Q\) (\(\det(Q)=\pm 1\)). What happens?
- Try with one matrix singular - confirm the product is always singular.
The Takeaway
- Determinant is multiplicative, not additive.
- \(\det(AB) = \det(A)\det(B)\) is a cornerstone identity in linear algebra.
- This property connects geometry (volume scaling) with algebra (matrix multiplication).
56. Invertibility and Zero Determinant (Flat vs. Full Volume)
The determinant gives a quick test for invertibility:
- If \(\det(A) \neq 0\), the matrix is invertible.
- If \(\det(A) = 0\), the matrix is singular (non-invertible).
Geometrically:
- Nonzero determinant → transformation keeps full dimension (no collapse).
- Zero determinant → transformation flattens space into a lower dimension (volume = 0).
Set Up Your Lab
import numpy as np
from sympy import Matrix
from sympy.matrices.common import NonInvertibleMatrixError
Step-by-Step Code Walkthrough
- Invertible example
= Matrix([[2,1],[5,3]])
A print("det(A):", A.det())
print("Inverse exists?", A.det() != 0)
print("A inverse:\n", A.inv())
det(A): 1
Inverse exists? True
A inverse:
Matrix([[3, -1], [-5, 2]])
The determinant is nonzero → invertible.
- Singular example (zero determinant)
= Matrix([[1,2],[2,4]])
B print("det(B):", B.det())
print("Inverse exists?", B.det() != 0)
det(B): 0
Inverse exists? False
Since the second row is a multiple of the first, determinant = 0 → no inverse.
- Solving systems with determinant check
If \(\det(A)=0\), the system \(Ax=b\) may have no solutions or infinitely many.
# 3. Solving systems with determinant check
= Matrix([1,2])
b try:
print("Solve Ax=b with singular B:", B.solve(b))
except NonInvertibleMatrixError as e:
print("Error when solving Ax=b:", e)
Error when solving Ax=b: Matrix det == 0; not invertible.
SymPy indicates inconsistency or multiple solutions.
- Higher-dimensional example
= Matrix([
C 1,0,0],
[0,2,0],
[0,0,3]
[
])print("det(C):", C.det())
print("Invertible?", C.det() != 0)
det(C): 6
Invertible? True
Diagonal entries all nonzero → invertible.
- NumPy version
= np.array([[2,1],[5,3]], dtype=float)
A print("det(A):", np.linalg.det(A))
print("Inverse:\n", np.linalg.inv(A))
= np.array([[1,2],[2,4]], dtype=float)
B print("det(B):", np.linalg.det(B))
# np.linalg.inv(B) would fail because det=0
det(A): 1.0000000000000002
Inverse:
[[ 3. -1.]
[-5. 2.]]
det(B): 0.0
Try It Yourself
- Build a 3×3 matrix with determinant 0 by making one row a multiple of another. Confirm singularity.
- Generate a random 4×4 matrix and check whether it’s invertible using
.det()
. - Test if two different 2×2 matrices are invertible, then multiply them together - is the product invertible too?
The Takeaway
- \(\det(A) \neq 0 \implies\) invertible (full volume).
- \(\det(A) = 0 \implies\) singular (space collapsed).
- Determinant gives both algebraic and geometric insight into when a matrix is reversible.
57. Cofactor Expansion (Laplace’s Method)
The cofactor expansion is a systematic way to compute determinants using minors. It’s not efficient for large matrices, but it reveals the recursive structure of determinants.
Definition
For an \(n \times n\) matrix \(A\),
\[ \det(A) = \sum_{j=1}^{n} (-1)^{i+j} a_{ij} \det(M_{ij}) \]
where:
- \(i\) = chosen row (or column),
- \(M_{ij}\) = minor matrix after removing row \(i\), column \(j\).
Set Up Your Lab
import numpy as np
from sympy import Matrix, symbols
Step-by-Step Code Walkthrough
- 2×2 case (base rule)
# declare symbols
= symbols('a b c d')
a, b, c, d
# build the matrix
= Matrix([[a, b],[c, d]])
A
# compute determinant
= A.det()
detA print("Determinant 2x2:", detA)
Determinant 2x2: a*d - b*c
Formula: \(\det(A) = ad - bc\).
- 3×3 example using cofactor expansion
= Matrix([
A 1,2,3],
[4,5,6],
[7,8,9]
[
])
= A.det()
detA print("Determinant via SymPy:", detA)
Determinant via SymPy: 0
Let’s compute manually along the first row:
= (
cofactor_expansion 1 * Matrix([[5,6],[8,9]]).det()
- 2 * Matrix([[4,6],[7,9]]).det()
+ 3 * Matrix([[4,5],[7,8]]).det()
)print("Cofactor expansion result:", cofactor_expansion)
Cofactor expansion result: 0
Both match (here = 0 because rows are dependent).
- Expansion along different rows/columns
The result is the same no matter which row/column you expand along.
= (
cofactor_col1 1 * Matrix([[2,3],[8,9]]).det()
- 4 * Matrix([[2,3],[5,6]]).det()
+ 7 * Matrix([[2,3],[5,6]]).det()
)print("Expansion along col1:", cofactor_col1)
Expansion along col1: -15
- Larger example (4×4)
= Matrix([
B 2,0,1,3],
[1,2,0,4],
[0,1,1,0],
[3,0,2,1]
[
])
print("Determinant 4x4:", B.det())
Determinant 4x4: -15
SymPy handles it directly, but conceptually it’s still the same recursive expansion.
- NumPy vs SymPy
= np.array([[2,0,1,3],[1,2,0,4],[0,1,1,0],[3,0,2,1]], dtype=float)
B_np print("NumPy determinant:", np.linalg.det(B_np))
NumPy determinant: -15.0
Try It Yourself
- Compute a 3×3 determinant manually using cofactor expansion and confirm with
.det()
. - Expand along a different row and check that the result is unchanged.
- Build a 4×4 diagonal matrix and expand it - what simplification do you notice?
The Takeaway
- Cofactor expansion defines determinant recursively.
- Works on any row or column, with consistent results.
- Important for proofs and theory, though not practical for computation on large matrices.
58. Permutations and Sign (The Combinatorial Core)
The determinant can also be defined using permutations of indices. This looks abstract, but it’s the most fundamental definition:
\[ \det(A) = \sum_{\sigma \in S_n} \text{sgn}(\sigma) \prod_{i=1}^n a_{i,\sigma(i)} \]
- \(S_n\) = set of all permutations of \(\{1,\dots,n\}\)
- \(\text{sgn}(\sigma)\) = +1 if the permutation is even, -1 if odd
- Each term = one product of entries, one from each row and column
This formula explains why determinants mix signs, why row swaps flip the determinant, and why dependence kills it.
Set Up Your Lab
import itertools
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Determinant by permutation expansion (3×3)
def determinant_permutation(A):
= A.shape[0]
n = 0
total for perm in itertools.permutations(range(n)):
= (-1)**(sum(1 for i in range(n) for j in range(i) if perm[j] > perm[i]))
sign = 1
product for i in range(n):
*= A[i, perm[i]]
product += sign * product
total return total
= np.array([[1,2,3],
A 4,5,6],
[7,8,9]])
[
print("Permutation formula det:", determinant_permutation(A))
print("NumPy det:", np.linalg.det(A))
Permutation formula det: 0
NumPy det: -9.51619735392994e-16
Both results ≈ 0, since rows are dependent.
- Count permutations
For \(n=3\), there are \(3! = 6\) terms:
\[ \det(A) = a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{13}a_{22}a_{31} - a_{11}a_{23}a_{32} - a_{12}a_{21}a_{33} \]
You can see the alternating signs explicitly.
- Verification with SymPy
= Matrix([[2,1,0],
M 1,3,4],
[0,2,5]])
[print("SymPy det:", M.det())
SymPy det: 9
Matches the permutation expansion.
- Growth of terms
- 2×2 → 2 terms
- 3×3 → 6 terms
- 4×4 → 24 terms
- \(n\) → \(n!\) terms (factorial growth!)
This is why cofactor or LU is preferred computationally.
Try It Yourself
- Write out the 2×2 permutation formula explicitly and check it equals \(ad - bc\).
- Expand a 3×3 determinant by hand using the six terms.
- Modify the code to count how many multiplications are required for a 5×5 matrix using the permutation definition.
The Takeaway
- Determinant = signed sum over all permutations.
- Signs come from permutation parity (even/odd swaps).
- This definition is the combinatorial foundation that unifies all determinant properties.
59. Cramer’s Rule (Solving with Determinants, and When Not to Use It)
Cramer’s Rule gives an explicit formula for solving a system of linear equations \(Ax = b\) using determinants. It is elegant but inefficient for large systems.
For \(A \in \mathbb{R}^{n \times n}\) with \(\det(A) \neq 0\):
\[ x_i = \frac{\det(A_i)}{\det(A)} \]
where \(A_i\) is \(A\) with its \(i\)-th column replaced by \(b\).
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Simple 2×2 example
Solve:
\[ \begin{cases} 2x + y = 5 \\ x - y = 1 \end{cases} \]
= Matrix([[2,1],[1,-1]])
A = Matrix([5,1])
b
= A.det()
detA print("det(A):", detA)
# Replace columns
= A.copy()
A1 0] = b
A1[:,= A.copy()
A2 1] = b
A2[:,
= A1.det() / detA
x1 = A2.det() / detA
x2 print("Solution via Cramer's Rule:", [x1, x2])
# Check with built-in solver
print("SymPy solve:", A.LUsolve(b))
det(A): -3
Solution via Cramer's Rule: [2, 1]
SymPy solve: Matrix([[2], [1]])
Both give the same solution.
- 3×3 example
= Matrix([
A 1,2,3],
[0,1,4],
[5,6,0]
[
])= Matrix([7,8,9])
b
= A.det()
detA print("det(A):", detA)
= []
solutions for i in range(A.shape[1]):
= A.copy()
Ai = b
Ai[:,i] /detA)
solutions.append(Ai.det()
print("Solution via Cramer's Rule:", solutions)
print("SymPy solve:", A.LUsolve(b))
det(A): 1
Solution via Cramer's Rule: [21, -16, 6]
SymPy solve: Matrix([[21], [-16], [6]])
- NumPy version (inefficient but illustrative)
= np.array([[2,1],[1,-1]], dtype=float)
A = np.array([5,1], dtype=float)
b
= np.linalg.det(A)
detA
= []
solutions for i in range(A.shape[1]):
= A.copy()
Ai = b
Ai[:,i] /detA)
solutions.append(np.linalg.det(Ai)
print("Solution:", solutions)
Solution: [np.float64(2.0000000000000004), np.float64(1.0)]
- Why not use it in practice?
- Requires computing \(n+1\) determinants.
- Determinant computation via cofactor expansion is factorial-time.
- Gaussian elimination or LU is far more efficient.
Try It Yourself
- Solve a 3×3 system using Cramer’s Rule and confirm with
A.solve(b)
. - Try Cramer’s Rule when \(\det(A)=0\). What happens?
- Compare runtime of Cramer’s Rule vs LU for a random 5×5 matrix.
The Takeaway
- Cramer’s Rule gives explicit formulas for solutions using determinants.
- Beautiful for theory, useful for small cases, but not computationally practical.
- It highlights the deep connection between determinants and solving linear systems.
60. Computing Determinants in Practice (Use LU, Mind Stability)
While definitions like cofactor expansion and permutations are beautiful, they are too slow for large matrices. In practice, determinants are computed using row reduction or LU decomposition, with careful attention to numerical stability.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Cofactor expansion is too slow
= Matrix([
A 1,2,3],
[4,5,6],
[7,8,10]
[
])print("det via cofactor expansion:", A.det())
det via cofactor expansion: -3
This works for 3×3, but complexity grows factorially.
- Determinant via triangular form (LU decomposition)
LU decomposition factorizes \(A = LU\), where \(L\) is lower triangular and \(U\) is upper triangular. Determinant = product of diagonals of \(U\), up to sign corrections for row swaps.
= A.LUdecomposition()
L, U, perm = A.det()
detA print("L:\n", L)
print("U:\n", U)
print("Permutation matrix:\n", perm)
print("det via LU product:", detA)
L:
Matrix([[1, 0, 0], [4, 1, 0], [7, 2, 1]])
U:
Matrix([[1, 2, 3], [0, -3, -6], [0, 0, 1]])
Permutation matrix:
[]
det via LU product: -3
- NumPy efficient method
= np.array([[1,2,3],[4,5,6],[7,8,10]], dtype=float)
A_np print("NumPy det:", np.linalg.det(A_np))
NumPy det: -3.000000000000001
NumPy uses optimized routines (LAPACK under the hood).
- Large random matrix
0)
np.random.seed(= np.random.rand(5,5)
B print("NumPy det (5x5):", np.linalg.det(B))
NumPy det (5x5): 0.009658225505885114
Computes quickly even for larger matrices.
- Stability issues
Determinants of large or ill-conditioned matrices can suffer from floating-point errors. For example, if rows are nearly dependent:
= np.array([[1,2,3],[2,4.0000001,6],[3,6,9]], dtype=float)
C print("det(C):", np.linalg.det(C))
det(C): -4.996003624823549e-23
The result may not be exactly 0 due to floating-point approximations.
Try It Yourself
- Compute the determinant of a random 10×10 matrix with
np.linalg.det
. - Compare results between SymPy (exact rational arithmetic) and NumPy (floating-point).
- Test determinant of a nearly singular matrix - notice numerical instability.
The Takeaway
- Determinants in practice are computed with LU decomposition or equivalent.
- Always be mindful of numerical stability - small errors matter when determinant ≈ 0.
- For exact answers (small cases), use symbolic tools like SymPy; for speed, use NumPy.
Chapter 7. Eigenvalues, Eigenvectors, and Dynamics
61. Eigenvalues and Eigenvectors (Directions That Stay Put)
An eigenvector of a matrix \(A\) is a special vector that doesn’t change direction when multiplied by \(A\). Instead, it only gets stretched or shrunk by a scalar called the eigenvalue.
Formally:
\[ A v = \lambda v \]
where \(v\) is an eigenvector and \(\lambda\) is the eigenvalue.
Geometrically: eigenvectors are “preferred directions” of a linear transformation.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- A simple 2×2 example
= Matrix([
A 2,1],
[1,2]
[
])
= A.eigenvects()
eigs print("Eigenvalues and eigenvectors:", eigs)
Eigenvalues and eigenvectors: [(1, 1, [Matrix([
[-1],
[ 1]])]), (3, 1, [Matrix([
[1],
[1]])])]
This outputs eigenvalues and their associated eigenvectors.
- Verify the eigen equation
Pick one eigenpair \((\lambda, v)\):
= eigs[0][0]
lam = eigs[0][2][0]
v print("Check Av = λv:", A*v, lam*v)
Check Av = λv: Matrix([[-1], [1]]) Matrix([[-1], [1]])
Both sides match → confirming the eigenpair.
- NumPy version
= np.array([[2,1],[1,2]], dtype=float)
A_np = np.linalg.eig(A_np)
eigvals, eigvecs
print("Eigenvalues:", eigvals)
print("Eigenvectors:\n", eigvecs)
Eigenvalues: [3. 1.]
Eigenvectors:
[[ 0.70710678 -0.70710678]
[ 0.70710678 0.70710678]]
Columns of eigvecs
are eigenvectors.
- Geometric interpretation (plot)
import matplotlib.pyplot as plt
= np.array(eigvecs[:,0])
v1 = np.array(eigvecs[:,1])
v2
0,0,v1[0],v1[1],head_width=0.1,color="blue",length_includes_head=True)
plt.arrow(0,0,v2[0],v2[1],head_width=0.1,color="red",length_includes_head=True)
plt.arrow(
0,color="black",linewidth=0.5)
plt.axhline(0,color="black",linewidth=0.5)
plt.axvline("equal")
plt.axis(
plt.grid()"Eigenvectors: directions that stay put")
plt.title( plt.show()
Both eigenvectors define directions where the transformation acts by scaling only.
- Random 3×3 matrix example
1)
np.random.seed(= Matrix(np.random.randint(-2,3,(3,3)))
B print("Matrix B:\n", B)
print("Eigenvalues/vectors:", B.eigenvects())
Matrix B:
Matrix([[1, 2, -2], [-1, 1, -2], [-2, -1, 2]])
Eigenvalues/vectors: [(4/3 + (-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3) + 13/(9*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)), 1, [Matrix([
[ -16/27 - 91/(81*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)) + (4/3 + (-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3) + 13/(9*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)))**2/9 - 7*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)/9],
[50/27 + 5*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)/9 - 2*(4/3 + (-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3) + 13/(9*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)))**2/9 + 65/(81*(-1/2 - sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3))],
[ 1]])]), (4/3 + 13/(9*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3), 1, [Matrix([
[ -16/27 - 7*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)/9 + (4/3 + 13/(9*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3))**2/9 - 91/(81*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3))],
[50/27 + 65/(81*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)) - 2*(4/3 + 13/(9*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)) + (-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3))**2/9 + 5*(-1/2 + sqrt(3)*I/2)*(2*sqrt(43)/3 + 127/27)**(1/3)/9],
[ 1]])]), (13/(9*(2*sqrt(43)/3 + 127/27)**(1/3)) + 4/3 + (2*sqrt(43)/3 + 127/27)**(1/3), 1, [Matrix([
[ -7*(2*sqrt(43)/3 + 127/27)**(1/3)/9 - 16/27 - 91/(81*(2*sqrt(43)/3 + 127/27)**(1/3)) + (13/(9*(2*sqrt(43)/3 + 127/27)**(1/3)) + 4/3 + (2*sqrt(43)/3 + 127/27)**(1/3))**2/9],
[-2*(13/(9*(2*sqrt(43)/3 + 127/27)**(1/3)) + 4/3 + (2*sqrt(43)/3 + 127/27)**(1/3))**2/9 + 65/(81*(2*sqrt(43)/3 + 127/27)**(1/3)) + 5*(2*sqrt(43)/3 + 127/27)**(1/3)/9 + 50/27],
[ 1]])])]
Try It Yourself
Compute eigenvalues and eigenvectors of
\[ \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \]
and verify that they match the diagonal entries.
Use NumPy to find eigenvectors of a rotation matrix by 90°. What do you notice?
For a singular matrix, check if 0 is an eigenvalue.
The Takeaway
- Eigenvalues = scale factors; eigenvectors = directions that stay put.
- The eigen equation \(Av=\lambda v\) captures the essence of a matrix’s action.
- They form the foundation for deeper topics like diagonalization, stability, and dynamics.
62. Characteristic Polynomial (Where Eigenvalues Come From)
Eigenvalues don’t appear out of thin air - they come from the characteristic polynomial of a matrix. For a square matrix \(A\),
\[ p(\lambda) = \det(A - \lambda I) \]
The roots of this polynomial are the eigenvalues of \(A\).
Set Up Your Lab
import numpy as np
from sympy import Matrix, symbols
Step-by-Step Code Walkthrough
- 2×2 example
\[ A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \]
= symbols('λ')
λ = Matrix([[2,1],[1,2]])
A = A.charpoly(λ)
char_poly print("Characteristic polynomial:", char_poly.as_expr())
print("Eigenvalues (roots):", char_poly.all_roots())
Characteristic polynomial: λ**2 - 4*λ + 3
Eigenvalues (roots): [1, 3]
Polynomial: \(\lambda^2 - 4\lambda + 3\). Roots: \(\lambda = 3, 1\).
- Verify with eigen computation
print("Eigenvalues directly:", A.eigenvals())
Eigenvalues directly: {3: 1, 1: 1}
Matches the roots of the polynomial.
- 3×3 example
= Matrix([
B 1,2,3],
[0,1,4],
[5,6,0]
[
])
= B.charpoly(λ)
char_poly_B print("Characteristic polynomial of B:", char_poly_B.as_expr())
print("Eigenvalues of B:", char_poly_B.all_roots())
Characteristic polynomial of B: λ**3 - 2*λ**2 - 38*λ - 1
Eigenvalues of B: [CRootOf(x**3 - 2*x**2 - 38*x - 1, 0), CRootOf(x**3 - 2*x**2 - 38*x - 1, 1), CRootOf(x**3 - 2*x**2 - 38*x - 1, 2)]
- NumPy version
NumPy doesn’t give the polynomial directly, but eigenvalues can be checked:
= np.array([[1,2,3],[0,1,4],[5,6,0]], dtype=float)
B_np = np.linalg.eigvals(B_np)
eigvals print("NumPy eigenvalues:", eigvals)
NumPy eigenvalues: [-5.2296696 -0.02635282 7.25602242]
- Relation to trace and determinant
For a 2×2 matrix
\[ A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \]
the characteristic polynomial is
\[ \lambda^2 - (a+d)\lambda + (ad - bc). \]
- Coefficient of \(\lambda\): \(-\text{trace}(A)\).
- Constant term: \(\det(A)\).
print("Trace:", A.trace())
print("Determinant:", A.det())
Trace: 4
Determinant: 3
Try It Yourself
Compute the characteristic polynomial of
\[ \begin{bmatrix} 4 & 0 \\ 0 & 5 \end{bmatrix} \]
and confirm eigenvalues are 4 and 5.
Check the relationship between polynomial coefficients, trace, and determinant for a 3×3 case.
Verify with NumPy that the roots of the polynomial equal the eigenvalues.
The Takeaway
- The characteristic polynomial encodes eigenvalues as its roots.
- Coefficients are tied to invariants: trace and determinant.
- This polynomial viewpoint is the bridge from algebraic formulas to geometric eigen-behavior.
63. Algebraic vs. Geometric Multiplicity (How Many and How Independent)
Eigenvalues can repeat, and when they do, two notions of multiplicity arise:
- Algebraic multiplicity: how many times the eigenvalue appears as a root of the characteristic polynomial.
- Geometric multiplicity: the dimension of the eigenspace (number of independent eigenvectors).
Always:
\[ 1 \leq \text{geometric multiplicity} \leq \text{algebraic multiplicity} \]
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Matrix with repeated eigenvalue
= Matrix([
A 2,1],
[0,2]
[
])
print("Eigenvalues and algebraic multiplicity:", A.eigenvals())
print("Eigenvectors:", A.eigenvects())
Eigenvalues and algebraic multiplicity: {2: 2}
Eigenvectors: [(2, 2, [Matrix([
[1],
[0]])])]
- Eigenvalue 2 has algebraic multiplicity = 2.
- But only 1 independent eigenvector → geometric multiplicity = 1.
- Diagonal matrix with repetition
= Matrix([
B 3,0,0],
[0,3,0],
[0,0,3]
[
])
print("Eigenvalues:", B.eigenvals())
print("Eigenvectors:", B.eigenvects())
Eigenvalues: {3: 3}
Eigenvectors: [(3, 3, [Matrix([
[1],
[0],
[0]]), Matrix([
[0],
[1],
[0]]), Matrix([
[0],
[0],
[1]])])]
Here, eigenvalue 3 has algebraic multiplicity = 3, and geometric multiplicity = 3.
- NumPy check
= np.array([[2,1],[0,2]], dtype=float)
A_np = np.linalg.eig(A_np)
eigvals, eigvecs print("Eigenvalues:", eigvals)
print("Eigenvectors:\n", eigvecs)
Eigenvalues: [2. 2.]
Eigenvectors:
[[ 1.0000000e+00 -1.0000000e+00]
[ 0.0000000e+00 4.4408921e-16]]
NumPy won’t show multiplicities directly, but you can see repeated eigenvalues.
- Comparing two cases
- Defective matrix: Algebraic > geometric (like the upper triangular \(A\)).
- Diagonalizable matrix: Algebraic = geometric (like \(B\)).
This distinction determines whether a matrix can be diagonalized.
Try It Yourself
Compute algebraic and geometric multiplicities of
\[ \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \]
(hint: only one eigenvector).
Take a diagonal matrix with repeated entries - what happens to multiplicities?
Test a random 3×3 singular matrix. Does 0 have algebraic multiplicity > 1?
The Takeaway
- Algebraic multiplicity = count of root in characteristic polynomial.
- Geometric multiplicity = dimension of eigenspace.
- If they match for all eigenvalues → matrix is diagonalizable.
64. Diagonalization (When a Matrix Becomes Simple)
A matrix \(A\) is diagonalizable if it can be written as
\[ A = P D P^{-1} \]
- \(D\) is diagonal (containing eigenvalues).
- Columns of \(P\) are the eigenvectors.
This means \(A\) acts like simple scaling in a “better” coordinate system.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- A diagonalizable 2×2 matrix
= Matrix([
A 4,1],
[2,3]
[
])
= A.diagonalize()
P, D print("P (eigenvectors):")
print(P)
print("D (eigenvalues on diagonal):")
print(D)
# Verify A = P D P^-1
print("Check:", P*D*P.inv())
P (eigenvectors):
Matrix([[-1, 1], [2, 1]])
D (eigenvalues on diagonal):
Matrix([[2, 0], [0, 5]])
Check: Matrix([[4, 1], [2, 3]])
- A non-diagonalizable matrix
= Matrix([
B 2,1],
[0,2]
[
])
try:
= B.diagonalize()
P, D print("Diagonalization successful")
except Exception as e:
print("Not diagonalizable:", e)
Not diagonalizable: Matrix is not diagonalizable
This fails because eigenvalue 2 has algebraic multiplicity 2 but geometric multiplicity 1.
- Diagonalization with NumPy
NumPy doesn’t diagonalize explicitly, but we can build \(P\) and \(D\) ourselves:
= np.array([[4,1],[2,3]], dtype=float)
A_np = np.linalg.eig(A_np)
eigvals, eigvecs
= eigvecs
P = np.diag(eigvals)
D = np.linalg.inv(P)
Pinv
print("Check A = PDP^-1:\n", P @ D @ Pinv)
Check A = PDP^-1:
[[4. 1.]
[2. 3.]]
- Powers of a diagonalizable matrix
One reason diagonalization is powerful:
\[ A^k = P D^k P^{-1} \]
Since \(D^k\) is trivial (just raise each diagonal entry to power \(k\)).
= 5
k
= np.linalg.matrix_power(A, k)
A_power = np.linalg.matrix_power(D, k)
D_power = P @ D_power @ np.linalg.inv(P)
A_via_diag
print("A^5 via diagonalization:\n", A_via_diag)
print("Direct A^5:\n", A_power)
A^5 via diagonalization:
[[2094. 1031.]
[2062. 1063.]]
Direct A^5:
[[2094 1031]
[2062 1063]]
Both match.
Try It Yourself
Check whether
\[ \begin{bmatrix} 5 & 0 \\ 0 & 5 \end{bmatrix} \]
is diagonalizable.
Try diagonalizing a rotation matrix by 90°. Do you get complex eigenvalues?
Verify the formula \(A^k = P D^k P^{-1}\) for a 3×3 diagonalizable matrix.
The Takeaway
- Diagonalization rewrites a matrix in its simplest form.
- Works if there are enough independent eigenvectors.
- It makes powers of \(A\) easy, and is the gateway to analyzing dynamics.
65. Powers of a Matrix (Long-Term Behavior via Eigenvalues)
One of the most useful applications of eigenvalues and diagonalization is computing powers of a matrix:
\[ A^k = P D^k P^{-1} \]
where \(D\) is diagonal with eigenvalues of \(A\). Each eigenvalue \(\lambda\) raised to \(k\) dictates how its eigenvector direction grows, decays, or oscillates over time.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Simple diagonal matrix
If \(D = \text{diag}(2,3)\):
= Matrix([[2,0],[0,3]])
D print("D^5 =")
print(D**5)
D^5 =
Matrix([[32, 0], [0, 243]])
Eigenvalues are 2 and 3. Raising to the 5th power just raises each eigenvalue to the 5th: \(2^5, 3^5\).
- Non-diagonal matrix
= Matrix([
A 4,1],
[2,3]
[
])
= A.diagonalize()
P, D print("D (eigenvalues):")
print(D)
# Compute A^10 via diagonalization
= P * (D**10) * P.inv()
A10 print("A^10 =")
print(A10)
D (eigenvalues):
Matrix([[2, 0], [0, 5]])
A^10 =
Matrix([[6510758, 3254867], [6509734, 3255891]])
Much easier than multiplying \(A\) ten times!
- NumPy version
= np.array([[4,1],[2,3]], dtype=float)
A_np = np.linalg.eig(A_np)
eigvals, eigvecs
= 10
k = np.diag(eigvals**k)
D_power = eigvecs @ D_power @ np.linalg.inv(eigvecs)
A10_np
print("A^10 via eigen-decomposition:\n", A10_np)
A^10 via eigen-decomposition:
[[6510758. 3254867.]
[6509734. 3255891.]]
- Long-term behavior
Eigenvalues tell us what happens as \(k \to \infty\):
- If \(|\lambda| < 1\) → decay to 0.
- If \(|\lambda| > 1\) → grows unbounded.
- If \(|\lambda| = 1\) → oscillates or stabilizes.
= Matrix([
B 0.5,0],
[0,1.2]
[
])
= B.diagonalize()
P, D print("Eigenvalues:", D)
print("B^20:", P*(D**20)*P.inv())
Eigenvalues: Matrix([[0.500000000000000, 0], [0, 1.20000000000000]])
B^20: Matrix([[9.53674316406250e-7, 0], [0, 38.3375999244747]])
Here, the component along eigenvalue 0.5 decays, while eigenvalue 1.2 grows.
Try It Yourself
- Compute \(A^{50}\) for a diagonal matrix with eigenvalues 0.9 and 1.1. Which component dominates?
- Take a stochastic (Markov) matrix and compute powers. Do the rows stabilize?
- Experiment with complex eigenvalues (like a rotation) and check if the powers oscillate.
The Takeaway
- Matrix powers are simple when using eigenvalues.
- Long-term dynamics are controlled by eigenvalue magnitudes.
- This insight is critical in Markov chains, stability analysis, and dynamical systems.
66. Real vs. Complex Spectra (Rotations and Oscillations)
Not all eigenvalues are real. Some matrices, especially those involving rotations, have complex eigenvalues. Complex eigenvalues often describe oscillations or rotations in systems.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Rotation matrix in 2D
A 90° rotation matrix:
\[ R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \]
= Matrix([[0, -1],
R 1, 0]])
[
print("Characteristic polynomial:", R.charpoly())
print("Eigenvalues:", R.eigenvals())
Characteristic polynomial: PurePoly(lambda**2 + 1, lambda, domain='ZZ')
Eigenvalues: {-I: 1, I: 1}
Result: eigenvalues are \(i\) and \(-i\) (purely imaginary).
- Verify eigen-equation with complex numbers
= R.eigenvects()
eigs for eig in eigs:
= eig[0]
lam = eig[2][0]
v print(f"λ = {lam}, Av = {R*v}, λv = {lam*v}")
λ = -I, Av = Matrix([[-1], [-I]]), λv = Matrix([[-1], [-I]])
λ = I, Av = Matrix([[-1], [I]]), λv = Matrix([[-1], [I]])
- NumPy version
= np.array([[0,-1],[1,0]], dtype=float)
R_np = np.linalg.eig(R_np)
eigvals, eigvecs print("Eigenvalues:", eigvals)
print("Eigenvectors:\n", eigvecs)
Eigenvalues: [0.+1.j 0.-1.j]
Eigenvectors:
[[0.70710678+0.j 0.70710678-0.j ]
[0. -0.70710678j 0. +0.70710678j]]
NumPy shows complex eigenvalues with j
(Python’s imaginary unit).
- Rotation by arbitrary angle
General 2D rotation:
\[ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \]
Eigenvalues:
\[ \lambda = e^{\pm i\theta} = \cos\theta \pm i\sin\theta \]
= np.pi/4 # 45 degrees
theta = np.array([[np.cos(theta), -np.sin(theta)],
R_theta
[np.sin(theta), np.cos(theta)]])
= np.linalg.eig(R_theta)
eigvals, eigvecs print("Eigenvalues (rotation 45°):", eigvals)
Eigenvalues (rotation 45°): [0.70710678+0.70710678j 0.70710678-0.70710678j]
- Oscillation insight
- Complex eigenvalues with \(|\lambda|=1\) → pure oscillation (no growth).
- If \(|\lambda|<1\) → decaying spiral.
- If \(|\lambda|>1\) → growing spiral.
Example:
= np.array([[0.8, -0.6],
A 0.6, 0.8]])
[
= np.linalg.eig(A)
eigvals, _ print("Eigenvalues:", eigvals)
Eigenvalues: [0.8+0.6j 0.8-0.6j]
These eigenvalues lie inside the unit circle → spiral decay.
Try It Yourself
- Compute eigenvalues of a 180° rotation. What happens?
- Modify the rotation matrix to include scaling (e.g., multiply by 1.1). Do the eigenvalues lie outside the unit circle?
- Plot the trajectory of repeatedly applying a rotation matrix to a vector.
The Takeaway
- Complex eigenvalues naturally appear in rotations and oscillatory systems.
- Their magnitude controls growth or decay; their angle controls oscillation.
- This is a key link between linear algebra and dynamics in physics and engineering.
67. Defective Matrices and a Peek at Jordan Form (When Diagonalization Fails)
Not every matrix has enough independent eigenvectors to be diagonalized. Such matrices are called defective. To handle them, mathematicians use the Jordan normal form, which extends diagonalization with extra structure.
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- A defective example
\[ A = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix} \]
= Matrix([[2,1],
A 0,2]])
[
print("Eigenvalues:", A.eigenvals())
print("Eigenvectors:", A.eigenvects())
Eigenvalues: {2: 2}
Eigenvectors: [(2, 2, [Matrix([
[1],
[0]])])]
- Eigenvalue 2 has algebraic multiplicity = 2.
- Only 1 eigenvector exists → geometric multiplicity = 1.
Thus \(A\) is defective, not diagonalizable.
- Attempt diagonalization
try:
= A.diagonalize()
P, D print("Diagonal form:", D)
except Exception as e:
print("Diagonalization failed:", e)
Diagonalization failed: Matrix is not diagonalizable
You’ll see an error - confirming \(A\) is not diagonalizable.
- Jordan form in SymPy
= A.jordan_form()
J, P print("Jordan form J:")
print(J)
print("P (generalized eigenvectors):")
print(P)
Jordan form J:
Matrix([[1, 0], [0, 1]])
P (generalized eigenvectors):
Matrix([[2, 1], [0, 2]])
The Jordan form shows a Jordan block:
\[ J = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix} \]
This block structure represents the failure of diagonalization.
- NumPy perspective
NumPy doesn’t compute Jordan form, but you can see repeated eigenvalues and lack of eigenvectors:
= np.array([[2,1],[0,2]], dtype=float)
A_np = np.linalg.eig(A_np)
eigvals, eigvecs print("Eigenvalues:", eigvals)
print("Eigenvectors:\n", eigvecs)
Eigenvalues: [2. 2.]
Eigenvectors:
[[ 1.0000000e+00 -1.0000000e+00]
[ 0.0000000e+00 4.4408921e-16]]
The eigenvectors matrix has fewer independent columns than expected.
- Generalized eigenvectors
Jordan form introduces generalized eigenvectors, which satisfy:
\[ (A - \lambda I)^k v = 0 \quad \text{for some } k>1 \]
They “fill the gap” when ordinary eigenvectors are insufficient.
Try It Yourself
Test diagonalizability of
\[ \begin{bmatrix} 3 & 1 \\ 0 & 3 \end{bmatrix} \]
and compare with its Jordan form.
Try a 3×3 defective matrix with one Jordan block of size 3.
Verify that Jordan blocks still capture the correct eigenvalues.
The Takeaway
- Defective matrices lack enough eigenvectors for diagonalization.
- Jordan form replaces diagonalization with blocks, keeping eigenvalues on the diagonal.
- Understanding Jordan blocks is essential for advanced linear algebra and differential equations.
68. Stability and Spectral Radius (Grow, Decay, or Oscillate)
The spectral radius of a matrix \(A\) is defined as
\[ \rho(A) = \max_i |\lambda_i| \]
where \(\lambda_i\) are the eigenvalues. It tells us the long-term behavior of repeated applications of \(A\):
- If \(\rho(A) < 1\) → powers of \(A\) tend to 0 (stable/decay).
- If \(\rho(A) = 1\) → powers neither blow up nor vanish (neutral, may oscillate).
- If \(\rho(A) > 1\) → powers diverge (unstable/growth).
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Stable matrix (\(\rho < 1\))
= np.array([[0.5, 0],
A 0, 0.3]])
[
= np.linalg.eigvals(A)
eigvals = max(abs(eigvals))
spectral_radius
print("Eigenvalues:", eigvals)
print("Spectral radius:", spectral_radius)
print("A^10:\n", np.linalg.matrix_power(A, 10))
Eigenvalues: [0.5 0.3]
Spectral radius: 0.5
A^10:
[[9.765625e-04 0.000000e+00]
[0.000000e+00 5.904900e-06]]
All entries shrink toward zero.
- Unstable matrix (\(\rho > 1\))
= np.array([[1.2, 0],
B 0, 0.9]])
[
= np.linalg.eigvals(B)
eigvals print("Eigenvalues:", eigvals, "Spectral radius:", max(abs(eigvals)))
print("B^10:\n", np.linalg.matrix_power(B, 10))
Eigenvalues: [1.2 0.9] Spectral radius: 1.2
B^10:
[[6.19173642 0. ]
[0. 0.34867844]]
The component along eigenvalue 1.2 grows quickly.
- Neutral/oscillatory case (\(\rho = 1\))
90° rotation matrix:
= np.array([[0, -1],
R 1, 0]])
[
= np.linalg.eigvals(R)
eigvals print("Eigenvalues:", eigvals)
print("Spectral radius:", max(abs(eigvals)))
print("R^4:\n", np.linalg.matrix_power(R, 4))
Eigenvalues: [0.+1.j 0.-1.j]
Spectral radius: 1.0
R^4:
[[1 0]
[0 1]]
Eigenvalues are ±i, with modulus 1 → pure oscillation.
- Spectral radius with SymPy
= Matrix([[2,1],[1,2]])
M = M.eigenvals()
eigs print("Eigenvalues:", eigs)
print("Spectral radius:", max(abs(ev) for ev in eigs))
Eigenvalues: {3: 1, 1: 1}
Spectral radius: 3
Try It Yourself
- Build a diagonal matrix with entries 0.8, 1.0, and 1.1. Predict which direction dominates as powers grow.
- Apply a random matrix repeatedly to a vector. Does it shrink, grow, or oscillate?
- Check if a Markov chain transition matrix always has spectral radius 1.
The Takeaway
- The spectral radius is the key number that predicts growth, decay, or oscillation.
- Long-term stability in dynamical systems is governed entirely by eigenvalue magnitudes.
- This connects linear algebra directly to control theory, Markov chains, and differential equations.
69. Markov Chains and Steady States (Probabilities as Linear Algebra)
A Markov chain is a process that moves between states according to probabilities. The transitions are encoded in a stochastic matrix \(P\):
- Each entry \(p_{ij} \geq 0\)
- Each row sums to 1
If we start with a probability vector \(v_0\), then after \(k\) steps:
\[ v_k = v_0 P^k \]
A steady state is a probability vector \(v\) such that \(vP = v\). It corresponds to eigenvalue \(\lambda = 1\).
Set Up Your Lab
import numpy as np
from sympy import Matrix
Step-by-Step Code Walkthrough
- Simple two-state chain
= np.array([
P 0.9, 0.1],
[0.5, 0.5]
[
])
= np.array([1.0, 0.0]) # start in state 1
v0 for k in [1, 2, 5, 10, 50]:
= v0 @ np.linalg.matrix_power(P, k)
vk print(f"Step {k}: {vk}")
Step 1: [0.9 0.1]
Step 2: [0.86 0.14]
Step 5: [0.83504 0.16496]
Step 10: [0.83335081 0.16664919]
Step 50: [0.83333333 0.16666667]
The distribution stabilizes as \(k\) increases.
- Steady state via eigenvector
Find eigenvector for eigenvalue 1:
= np.linalg.eig(P.T)
eigvals, eigvecs = eigvecs[:, np.isclose(eigvals, 1)]
steady_state = steady_state / steady_state.sum()
steady_state print("Steady state:", steady_state.real.flatten())
Steady state: [0.83333333 0.16666667]
- SymPy exact check
= Matrix([[0.9,0.1],[0.5,0.5]])
P_sym = P_sym.eigenvects()
steady print("Eigen info:", steady)
Eigen info: [(1.00000000000000, 1, [Matrix([
[0.707106781186548],
[0.707106781186547]])]), (0.400000000000000, 1, [Matrix([
[-0.235702260395516],
[ 1.17851130197758]])])]
- A 3-state example
= np.array([
Q 0.3, 0.7, 0.0],
[0.2, 0.5, 0.3],
[0.1, 0.2, 0.7]
[
])
= np.linalg.eig(Q.T)
eigvals, eigvecs = eigvecs[:, np.isclose(eigvals, 1)]
steady = steady / steady.sum()
steady print("Steady state for Q:", steady.real.flatten())
Steady state for Q: [0.17647059 0.41176471 0.41176471]
Try It Yourself
- Create a transition matrix where one state is absorbing (e.g., row = [0,0,1]). What happens to the steady state?
- Simulate a random walk on 3 states. Does the steady state distribute evenly?
- Compare long-run simulation with eigenvector computation.
The Takeaway
- Markov chains evolve by repeated multiplication with a stochastic matrix.
- Steady states are eigenvectors with eigenvalue 1.
- This framework powers real applications like PageRank, weather models, and queuing systems.
70. Linear Differential Systems (Solutions via Eigen-Decomposition)
Linear differential equations often reduce to systems of the form:
\[ \frac{d}{dt}x(t) = A x(t) \]
where \(A\) is a matrix and \(x(t)\) is a vector of functions. The solution is given by the matrix exponential:
\[ x(t) = e^{At} x(0) \]
If \(A\) is diagonalizable, this becomes simple using eigenvalues and eigenvectors.
Set Up Your Lab
import numpy as np
from sympy import Matrix, exp, symbols
from scipy.linalg import expm
Step-by-Step Code Walkthrough
- Simple system with diagonal matrix
\[ A = \begin{bmatrix} -1 & 0 \\ 0 & 2 \end{bmatrix} \]
= Matrix([[-1,0],
A 0, 2]])
[= symbols('t')
t = (A*t).exp()
expAt print("e^{At} =")
print(expAt)
e^{At} =
Matrix([[exp(-t), 0], [0, exp(2*t)]])
Solution:
\[ x(t) = \begin{bmatrix} e^{-t} & 0 \\ 0 & e^{2t} \end{bmatrix} x(0) \]
One component decays, the other grows.
- Non-diagonal example
= Matrix([[0,1],
B -2,-3]])
[= (B*t).exp()
expBt print("e^{Bt} =")
print(expBt)
e^{Bt} =
Matrix([[2*exp(-t) - exp(-2*t), exp(-t) - exp(-2*t)], [-2*exp(-t) + 2*exp(-2*t), -exp(-t) + 2*exp(-2*t)]])
Here the solution involves exponentials and possibly sines/cosines (oscillatory behavior).
- Numeric computation with SciPy
import numpy as np
from scipy.linalg import expm
= np.array([[-1,0],[0,2]], dtype=float)
A = 1.0
t print("Matrix exponential e^{At} at t=1:\n", expm(A*t))
Matrix exponential e^{At} at t=1:
[[0.36787944 0. ]
[0. 7.3890561 ]]
This computes \(e^{At}\) numerically.
- Simulation of a trajectory
= np.array([1.0, 1.0])
x0 for t in [0, 0.5, 1, 2]:
= expm(A*t) @ x0
xt print(f"x({t}) = {xt}")
x(0) = [1. 1.]
x(0.5) = [0.60653066 2.71828183]
x(1) = [0.36787944 7.3890561 ]
x(2) = [ 0.13533528 54.59815003]
One coordinate decays, the other explodes with time.
Try It Yourself
- Solve the system \(\dot{x} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}x\). What kind of motion do you see?
- Use SciPy to simulate a system with eigenvalues less than 0. Does it always decay?
- Try an unstable system with eigenvalues > 0 and watch how trajectories diverge.
The Takeaway
- Linear systems \(\dot{x} = Ax\) are solved via the matrix exponential.
- Eigenvalues determine stability: negative real parts = stable, positive = unstable, imaginary = oscillations.
- This ties linear algebra directly to differential equations and dynamical systems.
Chapter 8. Orthogonality, least squars, and QR
71. Inner Products Beyond Dot Product (Custom Notions of Angle)
The dot product is the standard inner product in \(\mathbb{R}^n\), but linear algebra allows us to define more general inner products that measure length and angle in different ways.
An inner product on a vector space is a function \(\langle u, v \rangle\) that satisfies:
- Linearity in the first argument.
- Symmetry: \(\langle u, v \rangle = \langle v, u \rangle\).
- Positive definiteness: \(\langle v, v \rangle \geq 0\) and equals 0 only if \(v=0\).
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Standard dot product
= np.array([1,2,3])
u = np.array([4,5,6])
v
print("Dot product:", np.dot(u,v))
Dot product: 32
This is the familiar formula: \(1·4 + 2·5 + 3·6 = 32\).
- Weighted inner product
We can define:
\[ \langle u, v \rangle_W = u^T W v \]
where \(W\) is a positive definite matrix.
= np.array([[2,0,0],
W 0,1,0],
[0,0,3]])
[
def weighted_inner(u,v,W):
return u.T @ W @ v
print("Weighted inner product:", weighted_inner(u,v,W))
Weighted inner product: 72
Here, some coordinates “count more” than others.
- Check symmetry and positivity
print("⟨u,v⟩ == ⟨v,u⟩ ?", weighted_inner(u,v,W) == weighted_inner(v,u,W))
print("⟨u,u⟩ (should be >0):", weighted_inner(u,u,W))
⟨u,v⟩ == ⟨v,u⟩ ? True
⟨u,u⟩ (should be >0): 33
- Angle with weighted inner product
\[ \cos\theta = \frac{\langle u,v \rangle_W}{\|u\|_W \, \|v\|_W} \]
def weighted_norm(u,W):
return np.sqrt(weighted_inner(u,u,W))
= weighted_inner(u,v,W) / (weighted_norm(u,W) * weighted_norm(v,W))
cos_theta print("Cosine of angle (weighted):", cos_theta)
Cosine of angle (weighted): 0.97573875381809
- Custom example: correlation inner product
For statistics, an inner product can be defined as covariance or correlation. Example with mean-centered vectors:
= np.array([2,4,6])
x = np.array([1,3,5])
y
= x - x.mean()
x_centered = y - y.mean()
y_centered
= np.dot(x_centered,y_centered)
corr_inner print("Correlation-style inner product:", corr_inner)
Correlation-style inner product: 8.0
Try It Yourself
- Define a custom inner product with \(W = \text{diag}(1,10,100)\). How does it change angles between vectors?
- Verify positivity: compute \(\langle v, v \rangle_W\) for a random vector \(v\).
- Compare dot product vs weighted inner product on the same pair of vectors.
The Takeaway
- Inner products generalize the dot product to new “geometries.”
- By changing the weight matrix \(W\), you change how lengths and angles are measured.
- This flexibility is essential in statistics, optimization, and machine learning.
72. Orthogonality and Orthonormal Bases (Perpendicular Power)
Two vectors are orthogonal if their inner product is zero:
\[ \langle u, v \rangle = 0 \]
If, in addition, each vector has length 1, the set is orthonormal. Orthonormal bases are extremely useful because they simplify computations: projections, decompositions, and coordinate changes all become clean.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Check orthogonality
= np.array([1, -1])
u = np.array([1, 1])
v
print("Dot product:", np.dot(u,v))
Dot product: 0
Since the dot product is 0, they’re orthogonal.
- Normalizing vectors
\[ \hat{u} = \frac{u}{\|u\|} \]
def normalize(vec):
return vec / np.linalg.norm(vec)
= normalize(u)
u_norm = normalize(v)
v_norm
print("Normalized u:", u_norm)
print("Normalized v:", v_norm)
Normalized u: [ 0.70710678 -0.70710678]
Normalized v: [0.70710678 0.70710678]
Now both have length 1.
- Form an orthonormal basis
= np.column_stack((u_norm, v_norm))
basis print("Orthonormal basis:\n", basis)
print("Check inner products:\n", basis.T @ basis)
Orthonormal basis:
[[ 0.70710678 0.70710678]
[-0.70710678 0.70710678]]
Check inner products:
[[ 1.00000000e+00 -2.23711432e-17]
[-2.23711432e-17 1.00000000e+00]]
The result is the identity matrix → perfectly orthonormal.
- Apply to coordinates
If \(x = [2,3]\), coordinates in the orthonormal basis are:
= np.array([2,3])
x = basis.T @ x
coords print("Coordinates in new basis:", coords)
print("Reconstruction:", basis @ coords)
Coordinates in new basis: [-0.70710678 3.53553391]
Reconstruction: [2. 3.]
It reconstructs exactly.
- Random example with QR
Any set of linearly independent vectors can be orthonormalized (Gram–Schmidt, or QR decomposition):
= np.random.rand(3,3)
M = np.linalg.qr(M)
Q, R print("Q (orthonormal basis):\n", Q)
print("Check Q^T Q = I:\n", Q.T @ Q)
Q (orthonormal basis):
[[-0.37617518 0.91975919 -0.111961 ]
[-0.82070726 -0.38684608 -0.42046368]
[-0.430037 -0.06628079 0.90037494]]
Check Q^T Q = I:
[[1.00000000e+00 5.55111512e-17 5.55111512e-17]
[5.55111512e-17 1.00000000e+00 3.47849792e-17]
[5.55111512e-17 3.47849792e-17 1.00000000e+00]]
Try It Yourself
- Create two 3D vectors and check if they’re orthogonal.
- Normalize them to form an orthonormal set.
- Use
np.linalg.qr
on a 4×3 random matrix and verify that the columns of \(Q\) are orthonormal.
The Takeaway
- Orthogonality means perpendicularity; orthonormality adds unit length.
- Orthonormal bases simplify coordinate systems, making inner products and projections easy.
- QR decomposition is the practical tool to generate orthonormal bases in higher dimensions.
73. Gram–Schmidt Process (Constructing Orthonormal Bases)
The Gram–Schmidt process takes a set of linearly independent vectors and turns them into an orthonormal basis. This is crucial for working with subspaces, projections, and numerical stability.
Given vectors \(v_1, v_2, \dots, v_n\):
Set \(u_1 = v_1\).
Subtract projections to make each new vector orthogonal to the earlier ones:
\[ u_k = v_k - \sum_{j=1}^{k-1} \frac{\langle v_k, u_j \rangle}{\langle u_j, u_j \rangle} u_j \]
Normalize:
\[ e_k = \frac{u_k}{\|u_k\|} \]
The set \(\{e_1, e_2, \dots, e_n\}\) is orthonormal.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Define vectors
= np.array([1.0, 1.0, 0.0])
v1 = np.array([1.0, 0.0, 1.0])
v2 = np.array([0.0, 1.0, 1.0])
v3 = [v1, v2, v3] V
- Implement Gram–Schmidt
def gram_schmidt(V):
= []
U for v in V:
= v.copy()
u for uj in U:
-= np.dot(v, uj) / np.dot(uj, uj) * uj
u
U.append(u)# Normalize
= [u/np.linalg.norm(u) for u in U]
E return np.array(E)
= gram_schmidt(V)
E print("Orthonormal basis:\n", E)
print("Check orthonormality:\n", np.round(E @ E.T, 6))
Orthonormal basis:
[[ 0.70710678 0.70710678 0. ]
[ 0.40824829 -0.40824829 0.81649658]
[-0.57735027 0.57735027 0.57735027]]
Check orthonormality:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
- Compare with NumPy QR
= np.linalg.qr(np.column_stack(V))
Q, R print("QR-based orthonormal basis:\n", Q)
print("Check Q^T Q = I:\n", np.round(Q.T @ Q, 6))
QR-based orthonormal basis:
[[-0.70710678 0.40824829 -0.57735027]
[-0.70710678 -0.40824829 0.57735027]
[-0. 0.81649658 0.57735027]]
Check Q^T Q = I:
[[ 1. 0. -0.]
[ 0. 1. -0.]
[-0. -0. 1.]]
Both methods give orthonormal bases.
- Application: projection
To project a vector \(x\) onto the span of \(V\):
= np.array([2.0, 2.0, 2.0])
x = sum((x @ e) * e for e in E)
proj print("Projection of x onto span(V):", proj)
Projection of x onto span(V): [2. 2. 2.]
Try It Yourself
- Run Gram–Schmidt on two vectors in 2D. Compare with just normalizing and checking orthogonality.
- Replace one vector with a linear combination of others. What happens?
- Use QR decomposition on a 4×3 random matrix and compare with Gram–Schmidt.
The Takeaway
- Gram–Schmidt converts arbitrary independent vectors into an orthonormal basis.
- Orthonormal bases simplify projections, decompositions, and computations.
- In practice, QR decomposition is often used as a numerically stable implementation.
74. Orthogonal Projections onto Subspaces (Closest Point Principle)
Given a subspace spanned by vectors, the orthogonal projection of a vector \(x\) onto the subspace is the point in the subspace that is closest to \(x\). This is a cornerstone idea in least squares, data fitting, and signal processing.
Formula Recap
If \(Q\) is a matrix with orthonormal columns spanning the subspace, the projection of \(x\) is:
\[ \text{proj}(x) = Q Q^T x \]
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Projection onto a line (1D subspace)
Suppose the subspace is spanned by \(u = [1,2]\).
= np.array([1.0,2.0])
u = np.array([3.0,1.0])
x
= u / np.linalg.norm(u)
u_norm = np.dot(x, u_norm) * u_norm
proj print("Projection of x onto span(u):", proj)
Projection of x onto span(u): [1. 2.]
This gives the closest point to \(x\) along the line spanned by \(u\).
- Projection onto a plane (2D subspace in 3D)
= np.array([1.0,0.0,0.0])
u1 = np.array([0.0,1.0,0.0])
u2 = np.column_stack([u1,u2]) # Orthonormal basis for xy-plane
Q
= np.array([2.0,3.0,5.0])
x = Q @ Q.T @ x
proj print("Projection of x onto xy-plane:", proj)
Projection of x onto xy-plane: [2. 3. 0.]
Result drops the z-component → projection onto the plane.
- General projection using QR
= np.array([[1,1,0],
A 0,1,1],
[1,0,1]], dtype=float)
[
= np.linalg.qr(A)
Q, R = Q[:, :2] # take first 2 independent columns
Q = np.array([2,2,2], dtype=float)
x
= Q @ Q.T @ x
proj print("Projection of x onto span(A):", proj)
Projection of x onto span(A): [2.66666667 1.33333333 1.33333333]
- Visualization (2D case)
import matplotlib.pyplot as plt
0,0,x[0],x[1],angles='xy',scale_units='xy',scale=1,color='red',label="x")
plt.quiver(0,0,proj[0],proj[1],angles='xy',scale_units='xy',scale=1,color='blue',label="Projection")
plt.quiver(0,0,u[0],u[1],angles='xy',scale_units='xy',scale=1,color='green',label="Subspace")
plt.quiver('equal'); plt.grid(); plt.legend(); plt.show() plt.axis(
Try It Yourself
- Project a vector onto the line spanned by \([2,1]\).
- Project \([1,2,3]\) onto the plane spanned by \([1,0,1]\) and \([0,1,1]\).
- Compare projection via formula \(Q Q^T x\) with manually solving least squares.
The Takeaway
- Orthogonal projection finds the closest point in a subspace.
- Formula \(Q Q^T x\) works perfectly when \(Q\) has orthonormal columns.
- Projections are the foundation of least squares, PCA, and many geometric algorithms.
75. Least-Squares Problems (Fit When Exact Solve Is Impossible)
Sometimes a system of equations \(Ax = b\) has no exact solution - usually because it’s overdetermined (more equations than unknowns). In this case, we look for an approximate solution \(x^*\) that minimizes the error:
\[ x^* = \arg\min_x \|Ax - b\|^2 \]
This is the least-squares solution, which geometrically is the projection of \(b\) onto the column space of \(A\).
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Overdetermined system
3 equations, 2 unknowns:
= np.array([[1,1],
A 1,2],
[1,3]], dtype=float)
[= np.array([6, 0, 0], dtype=float) b
- Solve least squares with NumPy
= np.linalg.lstsq(A, b, rcond=None)
x_star, residuals, rank, s print("Least squares solution:", x_star)
print("Residual norm squared:", residuals)
Least squares solution: [ 8. -3.]
Residual norm squared: [6.]
- Compare with normal equations
\[ A^T A x = A^T b \]
= np.linalg.inv(A.T @ A) @ (A.T @ b)
x_normal print("Solution via normal equations:", x_normal)
Solution via normal equations: [ 8. -3.]
- Geometric picture
The least-squares solution projects \(b\) onto the column space of \(A\):
= A @ x_star
proj print("Projection of b onto Col(A):", proj)
print("Original b:", b)
print("Error vector (b - proj):", b - proj)
Projection of b onto Col(A): [ 5. 2. -1.]
Original b: [6. 0. 0.]
Error vector (b - proj): [ 1. -2. 1.]
The error vector is orthogonal to the column space.
- Verify orthogonality condition
\[ A^T (b - Ax^*) = 0 \]
print("Check orthogonality:", A.T @ (b - A @ x_star))
Check orthogonality: [0. 0.]
The result should be (close to) zero.
Try It Yourself
- Create a taller \(A\) (say 5×2) with random numbers and solve least squares for a random \(b\).
- Compare the residual from
np.linalg.lstsq
with geometric intuition (projection). - Modify \(b\) so that the system has an exact solution. Check if least squares gives it exactly.
The Takeaway
- Least-squares finds the best-fit solution when no exact solution exists.
- It works by projecting \(b\) onto the column space of \(A\).
- This principle underlies regression, curve fitting, and countless applications in data science.
76. Normal Equations and Geometry of Residuals (Why It Works)
The least-squares solution can be found by solving the normal equations:
\[ A^T A x = A^T b \]
This comes from the condition that the residual vector
\[ r = b - Ax \]
is orthogonal to the column space of \(A\).
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Build an overdetermined system
= np.array([[1,1],
A 1,2],
[1,3]], dtype=float)
[= np.array([6, 0, 0], dtype=float) b
- Solve least squares via normal equations
= A.T @ A
ATA = A.T @ b
ATb = np.linalg.solve(ATA, ATb)
x_star
print("Least-squares solution x*:", x_star)
Least-squares solution x*: [ 8. -3.]
- Compute residual and check orthogonality
= b - A @ x_star
residual print("Residual vector:", residual)
print("Check A^T r ≈ 0:", A.T @ residual)
Residual vector: [ 1. -2. 1.]
Check A^T r ≈ 0: [0. 0.]
This verifies the residual is perpendicular to the column space of \(A\).
- Compare with NumPy’s least squares solver
*_ = np.linalg.lstsq(A, b, rcond=None)
x_lstsq, print("NumPy lstsq solution:", x_lstsq)
NumPy lstsq solution: [ 8. -3.]
The solutions should match (within numerical precision).
- Geometric picture
- \(b\) is a point in \(\mathbb{R}^3\).
- \(Ax\) is restricted to lie in the 2D column space of \(A\).
- The least-squares solution picks the \(Ax\) closest to \(b\).
- The error vector \(r = b - Ax^*\) is orthogonal to the subspace.
= A @ x_star
proj print("Projection of b onto Col(A):", proj)
Projection of b onto Col(A): [ 5. 2. -1.]
Try It Yourself
- Change \(b\) to \([1,1,1]\). Solve again and check the residual.
- Use a random tall \(A\) (say 6×2) and verify that the residual is always orthogonal.
- Compute \(\|r\|\) and see how it changes when you change \(b\).
The Takeaway
- Least squares works by making the residual orthogonal to the column space.
- Normal equations are the algebraic way to encode this condition.
- This orthogonality principle is the geometric heart of least-squares fitting.
77. QR Factorization (Stable Least Squares via Orthogonality)
While normal equations solve least squares, they can be numerically unstable if \(A^T A\) is ill-conditioned. A more stable method uses QR factorization:
\[ A = Q R \]
- \(Q\): matrix with orthonormal columns
- \(R\): upper triangular matrix
Then the least-squares problem reduces to solving:
\[ Rx = Q^T b \]
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Overdetermined system
= np.array([[1,1],
A 1,2],
[1,3]], dtype=float)
[= np.array([6, 0, 0], dtype=float) b
- QR factorization
= np.linalg.qr(A)
Q, R print("Q (orthonormal basis):\n", Q)
print("R (upper triangular):\n", R)
Q (orthonormal basis):
[[-5.77350269e-01 7.07106781e-01]
[-5.77350269e-01 -1.73054947e-16]
[-5.77350269e-01 -7.07106781e-01]]
R (upper triangular):
[[-1.73205081 -3.46410162]
[ 0. -1.41421356]]
- Solve least squares using QR
= Q.T @ b
y = np.linalg.solve(R[:2,:], y[:2]) # only top rows matter
x_star print("Least squares solution via QR:", x_star)
Least squares solution via QR: [ 8. -3.]
- Compare with NumPy’s
lstsq
*_ = np.linalg.lstsq(A, b, rcond=None)
x_lstsq, print("NumPy lstsq:", x_lstsq)
NumPy lstsq: [ 8. -3.]
The answers should match closely.
- Residual check
= b - A @ x_star
residual print("Residual vector:", residual)
print("Check orthogonality (Q^T r):", Q.T @ residual)
Residual vector: [ 1. -2. 1.]
Check orthogonality (Q^T r): [0.00000000e+00 3.46109895e-16]
Residual is orthogonal to the column space, confirming correctness.
Try It Yourself
- Solve least squares for a 5×2 random matrix using both normal equations and QR. Compare results.
- Check stability by making columns of \(A\) nearly dependent - see if QR behaves better than normal equations.
- Compute projection of \(b\) using \(Q Q^T b\) and confirm it equals \(A x^*\).
The Takeaway
- QR factorization provides a numerically stable way to solve least squares.
- It avoids the instability of normal equations.
- In practice, modern solvers (like NumPy’s
lstsq
) rely on QR or SVD under the hood.
78. Orthogonal Matrices (Length-Preserving Transforms)
An orthogonal matrix \(Q\) is a square matrix whose columns (and rows) are orthonormal vectors. Formally:
\[ Q^T Q = Q Q^T = I \]
Key properties:
- Preserves lengths: \(\|Qx\| = \|x\|\)
- Preserves dot products: \(\langle Qx, Qy \rangle = \langle x, y \rangle\)
- Determinant is either \(+1\) (rotation) or \(-1\) (reflection)
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Construct a simple orthogonal matrix
90° rotation in 2D:
= np.array([[0, -1],
Q 1, 0]])
[
print("Q^T Q =\n", Q.T @ Q)
Q^T Q =
[[1 0]
[0 1]]
Result = identity → confirms orthogonality.
- Check length preservation
= np.array([3,4])
x print("Original length:", np.linalg.norm(x))
print("Transformed length:", np.linalg.norm(Q @ x))
Original length: 5.0
Transformed length: 5.0
Both lengths match.
- Check dot product preservation
= np.array([1,0])
u = np.array([0,1])
v
print("Dot(u,v):", np.dot(u,v))
print("Dot(Q u, Q v):", np.dot(Q @ u, Q @ v))
Dot(u,v): 0
Dot(Q u, Q v): 0
Dot product is preserved.
- Reflection matrix
Reflection about the x-axis:
= np.array([[1,0],
R 0,-1]])
[
print("R^T R =\n", R.T @ R)
print("Determinant of R:", np.linalg.det(R))
R^T R =
[[1 0]
[0 1]]
Determinant of R: -1.0
Determinant = -1 → reflection.
- Random orthogonal matrix via QR
= np.random.rand(3,3)
M = np.linalg.qr(M)
Q, _ print("Q (random orthogonal):\n", Q)
print("Check Q^T Q ≈ I:\n", np.round(Q.T @ Q, 6))
Q (random orthogonal):
[[-0.59472353 0.03725157 -0.80306677]
[-0.61109913 -0.67000966 0.42147943]
[-0.52236172 0.74141714 0.42123492]]
Check Q^T Q ≈ I:
[[ 1. 0. -0.]
[ 0. 1. -0.]
[-0. -0. 1.]]
Try It Yourself
- Build a 2D rotation matrix for 45°. Verify it’s orthogonal.
- Check whether scaling matrices (e.g., \(\text{diag}(2,1)\)) are orthogonal. Why or why not?
- Generate a random orthogonal matrix with
np.linalg.qr
and test its determinant.
The Takeaway
- Orthogonal matrices are rigid motions: they rotate or reflect without distorting lengths or angles.
- They play a key role in numerical stability, geometry, and physics.
- Every orthonormal basis corresponds to an orthogonal matrix.
79. Fourier Viewpoint (Expanding in Orthogonal Waves)
The Fourier viewpoint treats functions or signals as combinations of orthogonal waves (sines and cosines). This is just linear algebra: sine and cosine functions form an orthogonal basis, and any signal can be expressed as a linear combination of them.
Formula Recap
For a discrete signal \(x\), the Discrete Fourier Transform (DFT) is:
\[ X_k = \sum_{n=0}^{N-1} x_n e^{-2\pi i kn / N}, \quad k=0,\dots,N-1 \]
The inverse DFT reconstructs the signal. Orthogonality of complex exponentials makes this work.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Build a simple signal
= np.linspace(0, 1, 100, endpoint=False)
t = np.sin(2*np.pi*3*t) + 0.5*np.sin(2*np.pi*5*t)
signal
plt.plot(t, signal)"Signal = sin(3Hz) + 0.5 sin(5Hz)")
plt.title("Time")
plt.xlabel("Amplitude")
plt.ylabel( plt.show()
- Compute Fourier transform (DFT)
= np.fft.fft(signal)
X = np.fft.fftfreq(len(t), d=1/100) # sampling rate = 100Hz
freqs
50], np.abs(X[:50]), basefmt=" ")
plt.stem(freqs[:"Fourier spectrum")
plt.title("Frequency (Hz)")
plt.xlabel("Magnitude")
plt.ylabel( plt.show()
Peaks appear at 3Hz and 5Hz → the frequencies of the original signal.
- Reconstruct signal using inverse FFT
= np.fft.ifft(X).real
signal_reconstructed print("Reconstruction error:", np.linalg.norm(signal - signal_reconstructed))
Reconstruction error: 1.4664679821708477e-15
Error is near zero → perfect reconstruction.
- Orthogonality check of sinusoids
= np.sin(2*np.pi*3*t)
u = np.sin(2*np.pi*5*t)
v
= np.dot(u, v)
inner print("Inner product of 3Hz and 5Hz sinusoids:", inner)
Inner product of 3Hz and 5Hz sinusoids: 1.2961853812498703e-14
The result is ≈ 0 → confirms orthogonality.
Try It Yourself
- Change the frequencies to 7Hz and 9Hz. Do the Fourier peaks move accordingly?
- Mix in some noise and check how the spectrum looks.
- Try cosine signals instead of sine. Do you still see orthogonality?
The Takeaway
- Fourier analysis = linear algebra with orthogonal sinusoidal basis functions.
- Any signal can be decomposed into orthogonal waves.
- This orthogonal viewpoint powers audio, image compression, and signal processing.
80. Polynomial and Multifeature Least Squares (Fitting More Flexibly)
Least squares isn’t limited to straight lines. By adding polynomial or multiple features, we can fit curves and capture more complex relationships. This is the foundation of regression models in data science.
Formula Recap
Given data \((x_i, y_i)\), we build a design matrix \(A\):
- For polynomial fit of degree \(d\):
\[ A = \begin{bmatrix} 1 & x_1 & x_1^2 & \dots & x_1^d \\ 1 & x_2 & x_2^2 & \dots & x_2^d \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_n & x_n^2 & \dots & x_n^d \end{bmatrix} \]
Then solve least squares:
\[ \hat{c} = \arg\min_c \|Ac - y\|^2 \]
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Generate noisy quadratic data
0)
np.random.seed(= np.linspace(-3, 3, 30)
x = 1 - 2*x + 0.5*x**2
y_true = y_true + np.random.normal(scale=2.0, size=x.shape)
y_noisy
="Noisy data")
plt.scatter(x, y_noisy, label"g--", label="True curve")
plt.plot(x, y_true,
plt.legend() plt.show()
- Build polynomial design matrix (degree 2)
= np.column_stack([np.ones_like(x), x, x**2])
A *_ = np.linalg.lstsq(A, y_noisy, rcond=None)
coeffs, print("Fitted coefficients:", coeffs)
Fitted coefficients: [ 1.15666306 -2.25753954 0.72733812]
- Plot fitted polynomial
= A @ coeffs
y_fit ="Noisy data")
plt.scatter(x, y_noisy, label"r-", label="Fitted quadratic")
plt.plot(x, y_fit,
plt.legend() plt.show()
- Higher-degree fit (overfitting demonstration)
= np.column_stack([x**i for i in range(6)]) # degree 5
A_high *_ = np.linalg.lstsq(A_high, y_noisy, rcond=None)
coeffs_high,
= A_high @ coeffs_high
y_fit_high ="Noisy data")
plt.scatter(x, y_noisy, label"r-", label="Degree 5 polynomial")
plt.plot(x, y_fit_high, "g--", label="True curve")
plt.plot(x, y_true,
plt.legend() plt.show()
- Multifeature regression example
Suppose we predict \(y\) from features \([x, x^2, \sin(x)]\):
= np.column_stack([np.ones_like(x), x, x**2, np.sin(x)])
A_multi *_ = np.linalg.lstsq(A_multi, y_noisy, rcond=None)
coeffs_multi, print("Multi-feature coefficients:", coeffs_multi)
Multi-feature coefficients: [ 1.15666306 -2.0492999 0.72733812 -0.65902274]
Try It Yourself
- Fit degree 3, 4, 5 polynomials to the same data. Watch how the curve changes.
- Add features like \(\cos(x)\) or \(\exp(x)\) - does the fit improve?
- Compare training error (fit to noisy data) vs error on new test points.
The Takeaway
- Least squares can fit polynomials and arbitrary feature combinations.
- The design matrix encodes how input variables transform into features.
- This is the basis of regression, curve fitting, and many machine learning models.
Chapter 9. SVD, PCA, and Conditioning
81. Singular Values and SVD (Universal Factorization)
The Singular Value Decomposition (SVD) is one of the most powerful results in linear algebra. It says any \(m \times n\) matrix \(A\) can be factored as:
\[ A = U \Sigma V^T \]
- \(U\): orthogonal \(m \times m\) matrix (left singular vectors)
- \(\Sigma\): diagonal \(m \times n\) matrix with nonnegative numbers (singular values)
- \(V\): orthogonal \(n \times n\) matrix (right singular vectors)
Singular values are always nonnegative and sorted \(\sigma_1 \geq \sigma_2 \geq \dots\).
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Compute SVD of a matrix
= np.array([[3,1,1],
A -1,3,1]])
[
= np.linalg.svd(A, full_matrices=True)
U, S, Vt
print("U:\n", U)
print("Singular values:", S)
print("V^T:\n", Vt)
U:
[[-0.70710678 -0.70710678]
[-0.70710678 0.70710678]]
Singular values: [3.46410162 3.16227766]
V^T:
[[-4.08248290e-01 -8.16496581e-01 -4.08248290e-01]
[-8.94427191e-01 4.47213595e-01 5.27355937e-16]
[-1.82574186e-01 -3.65148372e-01 9.12870929e-01]]
U
: orthogonal basis in input space.S
: singular values (as a 1D array).V^T
: orthogonal basis in output space.
- Reconstruct \(A\) from decomposition
= np.zeros((U.shape[1], Vt.shape[0]))
Sigma len(S), :len(S)] = np.diag(S)
Sigma[:
= U @ Sigma @ Vt
A_reconstructed print("Reconstruction error:", np.linalg.norm(A - A_reconstructed))
Reconstruction error: 1.5895974606912448e-15
The error should be near zero.
- Rank from SVD
Number of nonzero singular values = rank of \(A\).
= np.sum(S > 1e-10)
rank print("Rank of A:", rank)
Rank of A: 2
- Geometry: effect of \(A\)
SVD says:
- \(V\) rotates input space.
- \(\Sigma\) scales along orthogonal directions (by singular values).
- \(U\) rotates to output space.
This explains why SVD works for any matrix (not just square ones).
- Low-rank approximation preview
Keep only the top singular value(s) → best approximation of \(A\).
= 1
k = np.outer(U[:,0], Vt[0]) * S[0]
A_approx print("Rank-1 approximation:\n", A_approx)
Rank-1 approximation:
[[1. 2. 1.]
[1. 2. 1.]]
Try It Yourself
- Compute SVD for a random 5×3 matrix. Check if \(U\) and \(V\) are orthogonal.
- Compare singular values of a diagonal matrix vs a rotation matrix.
- Zero out small singular values and see how much of \(A\) is preserved.
The Takeaway
- SVD factorizes any matrix into rotations and scalings.
- Singular values reveal rank and strength of directions.
- It’s the universal tool of numerical linear algebra: the backbone of PCA, compression, and stability analysis.
82. Geometry of SVD (Rotations + Stretching)
The Singular Value Decomposition (SVD) has a beautiful geometric interpretation: every matrix is just a combination of two rotations (or reflections) and a stretching.
For \(A = U \Sigma V^T\):
- \(V^T\): rotates (or reflects) the input space.
- \(\Sigma\): stretches space along orthogonal axes by singular values \(\sigma_i\).
- \(U\): rotates (or reflects) the result into the output space.
This turns any linear transformation into a rotation → stretching → rotation pipeline.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Make a 2D matrix
= np.array([[2, 1],
A 1, 3]]) [
- Apply SVD
= np.linalg.svd(A)
U, S, Vt
print("U:\n", U)
print("Singular values:", S)
print("V^T:\n", Vt)
U:
[[-0.52573111 -0.85065081]
[-0.85065081 0.52573111]]
Singular values: [3.61803399 1.38196601]
V^T:
[[-0.52573111 -0.85065081]
[-0.85065081 0.52573111]]
- Visualize effect on the unit circle
The unit circle is often used to visualize linear transformations.
= np.linspace(0, 2*np.pi, 200)
theta = np.vstack((np.cos(theta), np.sin(theta)))
circle
= A @ circle
transformed
0], circle[1], 'b--', label="Unit circle")
plt.plot(circle[0], transformed[1], 'r-', label="Transformed")
plt.plot(transformed["equal")
plt.axis(
plt.legend()"Action of A on the unit circle")
plt.title( plt.show()
The circle becomes an ellipse. Its axes align with the singular vectors, and its radii are the singular values.
- Compare with decomposition steps
# Apply V^T
= Vt @ circle
step1 # Apply Σ
= np.diag(S)
Sigma = Sigma @ step1
step2 # Apply U
= U @ step2
step3
0], circle[1], 'b--', label="Unit circle")
plt.plot(circle[0], step3[1], 'g-', label="U Σ V^T circle")
plt.plot(step3["equal")
plt.axis(
plt.legend()"SVD decomposition of transformation")
plt.title( plt.show()
Both transformed shapes match → confirms SVD’s geometric picture.
Try It Yourself
- Change \(A\) to a pure shear, like
[[1,2],[0,1]]
. How does the ellipse look? - Try a diagonal matrix, like
[[3,0],[0,1]]
. Do the singular vectors match the coordinate axes? - Scale the input circle to a square and see if geometry still works.
The Takeaway
- SVD = rotate → stretch → rotate.
- The unit circle becomes an ellipse: axes = singular vectors, radii = singular values.
- This geometric lens makes SVD intuitive and explains why it’s so widely used in data, graphics, and signal processing.
83. Relation to Eigen-Decompositions (ATA and AAT)
Singular values and eigenvalues are closely connected. While eigen-decomposition applies only to square matrices, SVD works for any rectangular matrix. The bridge between them is:
\[ A^T A v = \sigma^2 v \quad \text{and} \quad A A^T u = \sigma^2 u \]
- \(v\): right singular vector (from eigenvectors of \(A^T A\))
- \(u\): left singular vector (from eigenvectors of \(A A^T\))
- \(\sigma\): singular values (square roots of eigenvalues of \(A^T A\) or \(A A^T\))
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Define a rectangular matrix
= np.array([[2, 0],
A 1, 1],
[0, 1]]) # shape 3x2 [
- Compute SVD directly
= np.linalg.svd(A)
U, S, Vt print("Singular values:", S)
Singular values: [2.30277564 1.30277564]
- Compare with eigenvalues of \(A^T A\)
= A.T @ A
ATA = np.linalg.eig(ATA)
eigvals, eigvecs
print("Eigenvalues of A^T A:", eigvals)
print("Square roots (sorted):", np.sqrt(np.sort(eigvals)[::-1]))
Eigenvalues of A^T A: [5.30277564 1.69722436]
Square roots (sorted): [2.30277564 1.30277564]
Notice: singular values from SVD = square roots of eigenvalues of \(A^T A\).
- Compare with eigenvalues of \(A A^T\)
= A @ A.T
AAT = np.linalg.eig(AAT)
eigvals2, eigvecs2
print("Eigenvalues of A A^T:", eigvals2)
print("Square roots:", np.sqrt(np.sort(eigvals2)[::-1]))
Eigenvalues of A A^T: [ 5.30277564e+00 1.69722436e+00 -2.01266546e-17]
Square roots: [2.30277564 1.30277564 nan]
/var/folders/_g/lq_pglm508df70x751kkxrl80000gp/T/ipykernel_31637/436251338.py:5: RuntimeWarning: invalid value encountered in sqrt
print("Square roots:", np.sqrt(np.sort(eigvals2)[::-1]))
They match too → confirming the relationship.
- Verify singular vectors
- Right singular vectors (\(V\)) = eigenvectors of \(A^T A\).
- Left singular vectors (\(U\)) = eigenvectors of \(A A^T\).
print("Right singular vectors (V):\n", Vt.T)
print("Eigenvectors of A^T A:\n", eigvecs)
print("Left singular vectors (U):\n", U)
print("Eigenvectors of A A^T:\n", eigvecs2)
Right singular vectors (V):
[[-0.95709203 0.28978415]
[-0.28978415 -0.95709203]]
Eigenvectors of A^T A:
[[ 0.95709203 -0.28978415]
[ 0.28978415 0.95709203]]
Left singular vectors (U):
[[-0.83125078 0.44487192 0.33333333]
[-0.54146663 -0.51222011 -0.66666667]
[-0.12584124 -0.73465607 0.66666667]]
Eigenvectors of A A^T:
[[-0.83125078 0.44487192 0.33333333]
[-0.54146663 -0.51222011 -0.66666667]
[-0.12584124 -0.73465607 0.66666667]]
Try It Yourself
- Try a square symmetric matrix and compare SVD with eigen-decomposition. Do they match?
- For a tall vs wide rectangular matrix, check whether \(U\) and \(V\) differ.
- Compute eigenvalues manually with
np.linalg.eig
for a random \(A\) and confirm singular values.
The Takeaway
- Singular values = square roots of eigenvalues of \(A^T A\) (or \(A A^T\)).
- Right singular vectors = eigenvectors of \(A^T A\).
- Left singular vectors = eigenvectors of \(A A^T\).
- SVD generalizes eigen-decomposition to all matrices, rectangular or square.
84. Low-Rank Approximation (Best Small Models)
One of the most useful applications of SVD is low-rank approximation: compressing a large matrix into a smaller one while keeping most of the important information.
The Eckart–Young theorem says: If \(A = U \Sigma V^T\), then the best rank-\(k\) approximation (in least-squares sense) is:
\[ A_k = U_k \Sigma_k V_k^T \]
where we keep only the top \(k\) singular values (and corresponding vectors).
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Create a matrix with hidden low-rank structure
0)
np.random.seed(= np.random.randn(50, 5) # 50 x 5
U = np.random.randn(5, 30) # 5 x 30
V = U @ V # true rank ≤ 5 A
- Full SVD
= np.linalg.svd(A, full_matrices=False)
U, S, Vt print("Singular values:", S[:10])
Singular values: [4.90672194e+01 4.05935057e+01 3.39228766e+01 3.07883338e+01
2.29261740e+01 3.97150036e-15 3.97150036e-15 3.97150036e-15
3.97150036e-15 3.97150036e-15]
Only the first ~5 should be large; the rest close to zero.
- Build rank-1 approximation
= 1
k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
A1 = np.linalg.norm(A - A1)
error1 print("Rank-1 approximation error:", error1)
Rank-1 approximation error: 65.36149641872869
- Rank-5 approximation (should be almost exact)
= 5
k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
A5 = np.linalg.norm(A - A5)
error5 print("Rank-5 approximation error:", error5)
Rank-5 approximation error: 5.756573247253659e-14
- Visual comparison (image compression demo)
Let’s see it on an image.
from sklearn.datasets import load_digits
= load_digits()
digits = digits.images[0] # 8x8 grayscale digit
img
= np.linalg.svd(img, full_matrices=False)
U, S, Vt
# Keep only top 2 singular values
= 2
k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
img2
1,2,1)
plt.subplot(="gray")
plt.imshow(img, cmap"Original")
plt.title(
1,2,2)
plt.subplot(="gray")
plt.imshow(img2, cmap"Rank-2 Approximation")
plt.title( plt.show()
Even with just 2 singular values, the digit shape is recognizable.
Try It Yourself
- Vary \(k\) in the image example (1, 2, 5, 10). How much detail do you keep?
- Compare the approximation error \(\|A - A_k\|\) as \(k\) increases.
- Apply low-rank approximation to random noisy data. Does it denoise?
The Takeaway
- SVD gives the best possible low-rank approximation in terms of error.
- By truncating singular values, you compress data while keeping its essential structure.
- This is the backbone of image compression, recommender systems, and dimensionality reduction.
85. Principal Component Analysis (Variance and Directions)
Principal Component Analysis (PCA) is one of the most important applications of SVD. It finds the directions (principal components) where data varies the most, and projects the data onto them to reduce dimensionality while preserving as much information as possible.
Mathematically:
- Center the data (subtract the mean).
- Compute covariance matrix \(C = \frac{1}{n} X^T X\).
- Eigenvectors of \(C\) = principal directions.
- Eigenvalues = variance explained.
- Equivalently: PCA = SVD of centered data matrix.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
Step-by-Step Code Walkthrough
- Generate synthetic 2D data
0)
np.random.seed(= np.random.randn(200, 2) @ np.array([[3,1],[1,0.5]]) # stretched cloud
X
0], X[:,1], alpha=0.3)
plt.scatter(X[:,"Original data")
plt.title("equal")
plt.axis( plt.show()
- Center the data
= X - X.mean(axis=0) X_centered
- Compute SVD
= np.linalg.svd(X_centered, full_matrices=False)
U, S, Vt print("Principal directions (V):\n", Vt)
Principal directions (V):
[[-0.94430098 -0.32908307]
[ 0.32908307 -0.94430098]]
Rows of Vt
are the principal components.
- Project data onto first component
= X_centered @ Vt.T[:,0]
X_pca1
=0.3)
plt.scatter(X_pca1, np.zeros_like(X_pca1), alpha"Data projected on first principal component")
plt.title( plt.show()
This collapses data into 1D, keeping the most variance.
- Visualize principal axes
0], X_centered[:,1], alpha=0.3)
plt.scatter(X_centered[:,for length, vector in zip(S, Vt):
0, vector[0]*length], [0, vector[1]*length], 'r-', linewidth=3)
plt.plot(["Principal components (directions of max variance)")
plt.title("equal")
plt.axis( plt.show()
The red arrows show where the data spreads most.
- PCA on real data (digits)
= load_digits()
digits = digits.data # 1797 samples, 64 features
X = X - X.mean(axis=0)
X_centered
= np.linalg.svd(X_centered, full_matrices=False)
U, S, Vt
= (S**2) / np.sum(S**2)
explained_variance print("Explained variance ratio (first 5):", explained_variance[:5])
Explained variance ratio (first 5): [0.14890594 0.13618771 0.11794594 0.08409979 0.05782415]
Try It Yourself
- Reduce digits dataset to 2D using the top 2 components and plot. Do digit clusters separate?
- Compare explained variance ratio for top 10 components.
- Add noise to data and check if PCA filters it out when projecting to fewer dimensions.
The Takeaway
- PCA finds directions of maximum variance using SVD.
- By projecting onto top components, you compress data with minimal information loss.
- PCA is the backbone of dimensionality reduction, visualization, and preprocessing in machine learning.
86. Pseudoinverse (Moore–Penrose) and Solving Ill-Posed Systems
The Moore–Penrose pseudoinverse \(A^+\) generalizes the inverse of a matrix. It allows solving systems \(Ax = b\) even when:
- \(A\) is not square, or
- \(A\) is singular (non-invertible).
The solution given by the pseudoinverse is the least-squares solution with minimum norm:
\[ x = A^+ b \]
If \(A = U \Sigma V^T\), then:
\[ A^+ = V \Sigma^+ U^T \]
where \(\Sigma^+\) is obtained by taking reciprocals of nonzero singular values.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Solve an overdetermined system (more equations than unknowns)
= np.array([[1,1],
A 1,2],
[1,3]]) # 3x2 system
[= np.array([1,2,2])
b
*_ = np.linalg.lstsq(A, b, rcond=None)
x_ls, print("Least-squares solution:", x_ls)
Least-squares solution: [0.66666667 0.5 ]
- Compute with pseudoinverse
= np.linalg.pinv(A)
A_pinv = A_pinv @ b
x_pinv print("Pseudoinverse solution:", x_pinv)
Pseudoinverse solution: [0.66666667 0.5 ]
Both match → pseudoinverse gives least-squares solution.
- Solve an underdetermined system (fewer equations than unknowns)
= np.array([[1,2,3]]) # 1x3
A = np.array([1])
b
= np.linalg.pinv(A) @ b
x_pinv print("Minimum norm solution:", x_pinv)
Minimum norm solution: [0.07142857 0.14285714 0.21428571]
Here, infinitely many solutions exist. The pseudoinverse picks the one with smallest norm.
- Compare with singular matrix
= np.array([[1,2],
A 2,4]]) # rank deficient
[= np.array([1,2])
b
= np.linalg.pinv(A) @ b
x_pinv print("Solution with pseudoinverse:", x_pinv)
Solution with pseudoinverse: [0.2 0.4]
Even when \(A\) is singular, pseudoinverse provides a solution.
- Manual pseudoinverse via SVD
= np.array([[1,2],
A 3,4]])
[= np.linalg.svd(A)
U, S, Vt = np.zeros((Vt.shape[0], U.shape[0]))
S_inv for i in range(len(S)):
if S[i] > 1e-10:
= 1/S[i]
S_inv[i,i]
= Vt.T @ S_inv @ U.T
A_pinv_manual print("Manual pseudoinverse:\n", A_pinv_manual)
print("NumPy pseudoinverse:\n", np.linalg.pinv(A))
Manual pseudoinverse:
[[-2. 1. ]
[ 1.5 -0.5]]
NumPy pseudoinverse:
[[-2. 1. ]
[ 1.5 -0.5]]
They match.
Try It Yourself
- Create an overdetermined system with noise and see how pseudoinverse smooths the solution.
- Compare pseudoinverse with direct inverse (
np.linalg.inv
) on a square nonsingular matrix. - Zero out small singular values manually and see how solution changes.
The Takeaway
- The pseudoinverse solves any linear system, square or not.
- It provides the least-squares solution in overdetermined cases and the minimum-norm solution in underdetermined cases.
- Built on SVD, it is a cornerstone of regression, optimization, and numerical methods.
87. Conditioning and Sensitivity (How Errors Amplify)
Conditioning tells us how sensitive a system is to small changes. For a linear system \(Ax = b\):
- If \(A\) is well-conditioned, small changes in \(b\) or \(A\) → small changes in \(x\).
- If \(A\) is ill-conditioned, tiny changes can cause huge swings in \(x\).
The condition number is defined as:
\[ \kappa(A) = \|A\| \cdot \|A^{-1}\| \]
For SVD:
\[ \kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}} \]
where \(\sigma_{\max}\) and \(\sigma_{\min}\) are the largest and smallest singular values.
- Large \(\kappa(A)\) → unstable system.
- Small \(\kappa(A)\) → stable system.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Well-conditioned system
= np.array([[2,0],
A 0,1]])
[= np.array([1,1])
b
= np.linalg.solve(A, b)
x = np.linalg.cond(A)
cond print("Solution:", x)
print("Condition number:", cond)
Solution: [0.5 1. ]
Condition number: 2.0
Condition number = ratio of singular values → moderate size.
- Ill-conditioned system
= np.array([[1, 1.0001],
A 1, 1.0000]])
[= np.array([2,2])
b
= np.linalg.lstsq(A, b, rcond=None)[0]
x = np.linalg.cond(A)
cond print("Solution:", x)
print("Condition number:", cond)
Solution: [ 2.00000000e+00 -5.73526099e-13]
Condition number: 40002.000075017124
Condition number is very large → instability.
- Perturb the right-hand side
= np.array([2, 2.001]) # tiny change
b2 = np.linalg.lstsq(A, b2, rcond=None)[0]
x2 print("Solution after tiny change:", x2)
Solution after tiny change: [ 12.001 -10. ]
The solution changes drastically → shows sensitivity.
- Relation to singular values
= np.linalg.svd(A)
U, S, Vt print("Singular values:", S)
print("Condition number (SVD):", S[0]/S[-1])
Singular values: [2.000050e+00 4.999875e-05]
Condition number (SVD): 40002.00007501713
- Scaling experiment
for scale in [1,1e-2,1e-4,1e-6]:
= np.array([[1,0],[0,scale]])
A print(f"Scale={scale}, condition number={np.linalg.cond(A)}")
Scale=1, condition number=1.0
Scale=0.01, condition number=100.0
Scale=0.0001, condition number=10000.0
Scale=1e-06, condition number=1000000.0
As scale shrinks, condition number explodes.
Try It Yourself
- Generate random matrices and compute their condition numbers. Which are stable?
- Compare condition numbers of Hilbert matrices (notoriously ill-conditioned).
- Explore how rounding errors grow with high condition numbers.
The Takeaway
- Condition number = measure of problem sensitivity.
- \(\kappa(A) = \sigma_{\max}/\sigma_{\min}\).
- Ill-conditioned problems amplify errors and are numerically unstable → why scaling, regularization, and good formulations matter.
88. Matrix Norms and Singular Values (Measuring Size Properly)
Matrix norms measure the size or strength of a matrix. They extend the idea of vector length to matrices. Norms are crucial for analyzing stability, error growth, and performance of algorithms.
Some important norms:
- Frobenius norm:
\[ \|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2} \]
Equivalent to treating the matrix as a big vector.
- Spectral norm (operator 2-norm):
\[ \|A\|_2 = \sigma_{\max} \]
The largest singular value - tells how much \(A\) can stretch a vector.
- 1-norm: maximum absolute column sum.
- ∞-norm: maximum absolute row sum.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Build a test matrix
= np.array([[1, -2, 3],
A 0, 4, 5],
[-1, 2, 1]]) [
- Compute different norms
= np.linalg.norm(A, 'fro')
fro = np.linalg.norm(A, 2)
spec = np.linalg.norm(A, 1)
one_norm = np.linalg.norm(A, np.inf)
inf_norm
print("Frobenius norm:", fro)
print("Spectral norm:", spec)
print("1-norm:", one_norm)
print("Infinity norm:", inf_norm)
Frobenius norm: 7.810249675906654
Spectral norm: 6.813953458914004
1-norm: 9.0
Infinity norm: 9.0
- Compare spectral norm with largest singular value
= np.linalg.svd(A)
U, S, Vt print("Largest singular value:", S[0])
print("Spectral norm:", spec)
Largest singular value: 6.813953458914004
Spectral norm: 6.813953458914004
They match → spectral norm = largest singular value.
- Frobenius norm from singular values
\[ \|A\|_F = \sqrt{\sigma_1^2 + \sigma_2^2 + \dots} \]
= np.sqrt(np.sum(S**2))
fro_from_svd print("Frobenius norm (from SVD):", fro_from_svd)
Frobenius norm (from SVD): 7.810249675906654
- Stretching effect demonstration
Pick a random vector and see how much it grows:
= np.random.randn(3)
x = np.linalg.norm(A @ x) / np.linalg.norm(x)
stretch print("Stretch factor:", stretch)
print("Spectral norm (max possible stretch):", spec)
Stretch factor: 2.7537463268177698
Spectral norm (max possible stretch): 6.813953458914004
The stretch ≤ spectral norm, always.
Try It Yourself
- Compare norms for diagonal matrices - do they match the largest diagonal entry?
- Generate random matrices and see how norms differ.
- Compute Frobenius vs spectral norm for a rank-1 matrix.
The Takeaway
- Frobenius norm = overall energy of the matrix.
- Spectral norm = maximum stretching power (largest singular value).
- Other norms (1-norm, ∞-norm) capture row/column dominance.
- Singular values unify all these views of “matrix size.”
89. Regularization (Ridge/Tikhonov to Tame Instability)
When solving \(Ax = b\), if \(A\) is ill-conditioned (large condition number), small errors in data can cause huge errors in the solution. Regularization stabilizes the problem by adding a penalty term that discourages extreme solutions.
The most common form: ridge regression (a.k.a. Tikhonov regularization):
\[ x_\lambda = \arg\min_x \|Ax - b\|^2 + \lambda \|x\|^2 \]
Closed form:
\[ x_\lambda = (A^T A + \lambda I)^{-1} A^T b \]
Here \(\lambda > 0\) controls the amount of regularization:
- Small \(\lambda\): solution close to least-squares.
- Large \(\lambda\): smaller coefficients, more stability.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Build an ill-conditioned system
= np.array([[1, 1.001],
A 1, 0.999]])
[= np.array([2, 2]) b
- Solve without regularization
*_ = np.linalg.lstsq(A, b, rcond=None)
x_ls, print("Least squares solution:", x_ls)
Least squares solution: [ 2.00000000e+00 -2.84186735e-14]
The result may be unstable.
- Apply ridge regularization
= 0.1
lam = np.linalg.inv(A.T @ A + lam*np.eye(2)) @ A.T @ b
x_ridge print("Ridge solution (λ=0.1):", x_ridge)
Ridge solution (λ=0.1): [0.97561927 0.97559976]
- Compare effect of different λ
= np.logspace(-4, 2, 20)
lambdas = []
solutions for lam in lambdas:
= np.linalg.inv(A.T @ A + lam*np.eye(2)) @ A.T @ b
x_reg
solutions.append(np.linalg.norm(x_reg))
'o-')
plt.semilogx(lambdas, solutions, "λ (regularization strength)")
plt.xlabel("Solution norm")
plt.ylabel("Effect of ridge regularization")
plt.title( plt.show()
As \(\lambda\) increases, the solution becomes smaller and more stable.
- Connection to SVD
If \(A = U \Sigma V^T\):
\[ x_\lambda = \sum_i \frac{\sigma_i}{\sigma_i^2 + \lambda} (u_i^T b) v_i \]
Small singular values (causing instability) get damped by \(\frac{\sigma_i}{\sigma_i^2 + \lambda}\).
Try It Yourself
- Experiment with larger and smaller \(\lambda\). What happens to the solution?
- Add random noise to \(b\). Compare least-squares vs ridge stability.
- Plot how each coefficient changes with λ.
The Takeaway
- Regularization controls instability in ill-conditioned problems.
- Ridge regression balances fit vs. stability using λ.
- In SVD terms, regularization damps small singular values that cause wild solutions.
90. Rank-Revealing QR and Practical Diagnostics (What Rank Really Is)
In practice, we often need to determine the numerical rank of a matrix - not just the theoretical rank, but how many directions carry meaningful information beyond round-off errors or noise. A useful tool for this is the Rank-Revealing QR (RRQR) factorization.
For a matrix \(A\):
\[ A P = Q R \]
- \(Q\): orthogonal matrix
- \(R\): upper triangular matrix
- \(P\): column permutation matrix
By reordering columns smartly, the diagonal of \(R\) reveals which directions are significant.
Set Up Your Lab
import numpy as np
from scipy.linalg import qr
Step-by-Step Code Walkthrough
- Build a nearly rank-deficient matrix
= np.array([[1, 2, 3],
A 2, 4.001, 6],
[3, 6, 9.001]])
[print("Rank (theoretical):", np.linalg.matrix_rank(A))
Rank (theoretical): 3
This matrix is almost rank 2 but with small perturbations.
- QR with column pivoting
= qr(A, pivoting=True)
Q, R, P print("R:\n", R)
print("Column permutation:", P)
R:
[[-1.12257740e+01 -7.48384925e+00 -3.74165738e+00]
[ 0.00000000e+00 -1.20185042e-03 -1.84886859e-04]
[ 0.00000000e+00 0.00000000e+00 -7.41196374e-05]]
Column permutation: [2 1 0]
The diagonal entries of \(R\) decrease rapidly → numerical rank is determined where they become tiny.
- Compare with SVD
= np.linalg.svd(A)
U, S, Vt print("Singular values:", S)
Singular values: [1.40009286e+01 1.00000000e-03 7.14238341e-05]
The singular values tell the same story: one is very small → effective rank ≈ 2.
- Thresholding for rank
= 1e-3
tol = np.sum(S > tol)
rank_est print("Estimated rank:", rank_est)
Estimated rank: 2
- Diagnostics on a noisy matrix
0)
np.random.seed(= np.random.randn(50, 10) @ np.random.randn(10, 10) # rank ≤ 10
B -1] += 1e-6 * np.random.randn(50) # tiny noise
B[:,
= np.linalg.svd(B)
U, S, Vt 'o-')
plt.semilogy(S, "Singular values (log scale)")
plt.title("Index")
plt.xlabel("Value")
plt.ylabel( plt.show()
The drop in singular values shows effective rank.
Try It Yourself
- Change the perturbation in \(A\) from 0.001 to 0.000001. Does the numerical rank change?
- Test QR with pivoting on random rectangular matrices.
- Compare rank estimates from QR vs SVD for large noisy matrices.
The Takeaway
- Rank-revealing QR is a practical tool to detect effective rank in real-world data.
- SVD gives the most precise picture (singular values), but QR with pivoting is faster.
- Understanding numerical rank is crucial for diagnostics, stability, and model complexity control.
Chapter 10. Applications and computation
91. 2D/3D Geometry Pipelines (Cameras, Rotations, and Transforms)
Linear algebra powers the geometry pipelines in computer graphics and robotics.
- 2D transforms: rotation, scaling, translation.
- 3D transforms: same ideas, but with an extra dimension.
- Homogeneous coordinates let us unify all transforms (even translations) into matrix multiplications.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Rotation in 2D
\[ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \]
= np.pi/4 # 45 degrees
theta = np.array([[np.cos(theta), -np.sin(theta)],
R
[np.sin(theta), np.cos(theta)]])
= np.array([1, 0])
point = R @ point
rotated
print("Original:", point)
print("Rotated:", rotated)
Original: [1 0]
Rotated: [0.70710678 0.70710678]
- Translation using homogeneous coordinates
In 2D:
\[ T(dx, dy) = \begin{bmatrix} 1 & 0 & dx \\ 0 & 1 & dy \\ 0 & 0 & 1 \end{bmatrix} \]
= np.array([[1,0,2],
T 0,1,1],
[0,0,1]])
[
= np.array([1,1,1]) # homogeneous (x=1,y=1)
p_h = T @ p_h
translated print("Translated point:", translated)
Translated point: [3 2 1]
- Combine rotation + translation
Transformations compose by multiplying matrices.
= T @ np.block([[R, np.zeros((2,1))],
M 1,2)), 1]])
[np.zeros((= M @ p_h
combined print("Combined transform (rotation+translation):", combined)
Combined transform (rotation+translation): [2. 2.41421356 1. ]
- 3D rotation (around z-axis)
\[ R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix} \]
= np.pi/3
theta = np.array([[np.cos(theta), -np.sin(theta), 0],
Rz 0],
[np.sin(theta), np.cos(theta), 0, 0, 1]])
[
= np.array([1,0,0])
point3d = Rz @ point3d
rotated3d print("3D rotated point:", rotated3d)
3D rotated point: [0.5 0.8660254 0. ]
- Camera projection (3D → 2D)
Simple pinhole model:
\[ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} f \cdot x / z \\ f \cdot y / z \end{bmatrix} \]
= 1.0 # focal length
f = np.array([[f,0,0],
P 0,f,0],
[0,0,1]]) # projection matrix
[
= np.array([2,3,5])
point3d = P @ point3d
p_proj = p_proj[:2] / p_proj[2] # divide by z
p_proj print("Projected 2D point:", p_proj)
Projected 2D point: [0.4 0.6]
Try It Yourself
- Rotate a square in 2D, then translate it. Plot before/after.
- Rotate a 3D point cloud around x, y, and z axes.
- Project a cube into 2D using the pinhole camera model.
The Takeaway
- Geometry pipelines = sequences of linear transforms.
- Homogeneous coordinates unify rotation, scaling, and translation.
- Camera projection links 3D world to 2D images - a cornerstone of graphics and vision.
92. Computer Graphics and Robotics (Homogeneous Tricks in Action)
Computer graphics and robotics both rely on homogeneous coordinates to unify rotations, translations, scalings, and projections into a single framework. With \(4 \times 4\) matrices in 3D, entire transformation pipelines can be built as matrix products.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Homogeneous representation of a point
In 3D:
\[ (x, y, z) \mapsto (x, y, z, 1) \]
= np.array([1,2,3,1]) # homogeneous point p
- Define translation, rotation, and scaling matrices
- Translation by \((dx,dy,dz)\):
= np.array([[1,0,0,2],
T 0,1,0,1],
[0,0,1,3],
[0,0,0,1]]) [
- Scaling by factors \((sx, sy, sz)\):
= np.diag([2, 0.5, 1.5, 1]) S
- Rotation about z-axis (\(\theta = 90^\circ\)):
= np.pi/2
theta = np.array([[np.cos(theta), -np.sin(theta), 0, 0],
Rz 0, 0],
[np.sin(theta), np.cos(theta), 0, 0, 1, 0],
[0, 0, 0, 1]]) [
- Combine transforms into a pipeline
= T @ Rz @ S # first scale, then rotate, then translate
M = M @ p
p_transformed print("Transformed point:", p_transformed)
Transformed point: [1. 3. 7.5 1. ]
- Robotics: forward kinematics of a 2-link arm
Each joint is a rotation + translation.
def link(theta, length):
return np.array([[np.cos(theta), -np.sin(theta), 0, length*np.cos(theta)],
0, length*np.sin(theta)],
[np.sin(theta), np.cos(theta), 0, 0, 1, 0],
[0, 0, 0, 1]])
[
= np.pi/4, np.pi/6
theta1, theta2 = 2, 1.5
L1, L2
= link(theta1, L1)
M1 = link(theta2, L2)
M2
= M1 @ M2 @ np.array([0,0,0,1])
end_effector print("End effector position:", end_effector[:3])
End effector position: [1.80244213 2.8631023 0. ]
- Graphics: simple 3D camera projection
= 2.0
f = np.array([[f,0,0,0],
P 0,f,0,0],
[0,0,1,0]])
[
= np.array([[x,y,z,1] for x in [0,1] for y in [0,1] for z in [0,1]])
cube = (P @ cube.T).T
proj = proj[:,:2] / proj[:,2:3]
proj2d
0], proj2d[:,1])
plt.scatter(proj2d[:,"Projected cube")
plt.title( plt.show()
/var/folders/_g/lq_pglm508df70x751kkxrl80000gp/T/ipykernel_31637/2038614107.py:8: RuntimeWarning: divide by zero encountered in divide
proj2d = proj[:,:2] / proj[:,2:3]
/var/folders/_g/lq_pglm508df70x751kkxrl80000gp/T/ipykernel_31637/2038614107.py:8: RuntimeWarning: invalid value encountered in divide
proj2d = proj[:,:2] / proj[:,2:3]
Try It Yourself
- Change order of transforms (
Rz @ S @ T
). How does the result differ? - Add a third joint to the robotic arm and compute new end-effector position.
- Project the cube with different focal lengths \(f\).
The Takeaway
- Homogeneous coordinates unify all transformations.
- Robotics uses this framework for forward kinematics.
- Graphics uses it for camera and projection pipelines.
- Both fields rely on the same linear algebra tricks - just applied differently.
93. Graphs, Adjacency, and Laplacians (Networks via Matrices)
Graphs can be studied with linear algebra by encoding them into matrices. Two of the most important:
Adjacency matrix \(A\):
\[ A_{ij} = \begin{cases} 1 & \text{if edge between i and j exists} \\ 0 & \text{otherwise} \end{cases} \]
Graph Laplacian \(L\):
\[ L = D - A \]
where \(D\) is the degree matrix ($D_{ii} = $ number of neighbors of node \(i\)).
These matrices let us analyze connectivity, diffusion, and clustering.
Set Up Your Lab
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Build a simple graph
= nx.Graph()
G 0,1), (1,2), (2,3), (3,0), (0,2)]) # square with diagonal
G.add_edges_from([(
=True, node_color="lightblue", node_size=800)
nx.draw(G, with_labels plt.show()
- Adjacency matrix
= nx.to_numpy_array(G)
A print("Adjacency matrix:\n", A)
Adjacency matrix:
[[0. 1. 1. 1.]
[1. 0. 1. 0.]
[1. 1. 0. 1.]
[1. 0. 1. 0.]]
- Degree and Laplacian matrices
= np.diag(A.sum(axis=1))
D = D - A
L print("Degree matrix:\n", D)
print("Graph Laplacian:\n", L)
Degree matrix:
[[3. 0. 0. 0.]
[0. 2. 0. 0.]
[0. 0. 3. 0.]
[0. 0. 0. 2.]]
Graph Laplacian:
[[ 3. -1. -1. -1.]
[-1. 2. -1. 0.]
[-1. -1. 3. -1.]
[-1. 0. -1. 2.]]
- Eigenvalues of Laplacian (connectivity check)
= np.linalg.eigh(L)
eigvals, eigvecs print("Laplacian eigenvalues:", eigvals)
Laplacian eigenvalues: [1.11022302e-16 2.00000000e+00 4.00000000e+00 4.00000000e+00]
- The number of zero eigenvalues = number of connected components.
- Spectral embedding (clustering)
Use Laplacian eigenvectors to embed nodes in low dimensions.
= eigvecs[:,1:3] # skip the trivial first eigenvector
coords 0], coords[:,1], c=range(len(coords)), cmap="tab10", s=200)
plt.scatter(coords[:,for i, (x,y) in enumerate(coords):
str(i), fontsize=12, ha="center", va="center", color="white")
plt.text(x, y, "Spectral embedding of graph")
plt.title( plt.show()
Try It Yourself
- Remove one edge from the graph and see how Laplacian eigenvalues change.
- Add a disconnected node - does an extra zero eigenvalue appear?
- Try a random graph and compare adjacency vs Laplacian spectra.
The Takeaway
- Adjacency matrices describe direct graph structure.
- Laplacians capture connectivity and diffusion.
- Eigenvalues of \(L\) reveal graph properties like connectedness and clustering - bridging networks with linear algebra.
94. Data Preprocessing as Linear Ops (Centering, Whitening, Scaling)
Many machine learning and data analysis workflows begin with preprocessing, and linear algebra provides the tools.
- Centering: subtract the mean → move data to origin.
- Scaling: divide by standard deviation → normalize feature ranges.
- Whitening: decorrelate features → make covariance matrix the identity.
Each step can be written as a matrix operation.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Generate correlated data
0)
np.random.seed(= np.random.randn(200, 2) @ np.array([[3,1],[1,0.5]])
X 0], X[:,1], alpha=0.4)
plt.scatter(X[:,"Original correlated data")
plt.title("equal")
plt.axis( plt.show()
- Centering (subtract mean)
= X - X.mean(axis=0)
X_centered print("Mean after centering:", X_centered.mean(axis=0))
Mean after centering: [ 8.88178420e-18 -1.22124533e-17]
- Scaling (normalize features)
= X_centered / X_centered.std(axis=0)
X_scaled print("Std after scaling:", X_scaled.std(axis=0))
Std after scaling: [1. 1.]
- Whitening via eigen-decomposition
Covariance of centered data:
= np.cov(X_centered.T)
C = np.linalg.eigh(C)
eigvals, eigvecs
= eigvecs @ np.diag(1/np.sqrt(eigvals)) @ eigvecs.T
W = X_centered @ W X_white
Check covariance:
print("Whitened covariance:\n", np.cov(X_white.T))
Whitened covariance:
[[1.00000000e+00 2.54402864e-15]
[2.54402864e-15 1.00000000e+00]]
- Compare scatter plots
1,3,1)
plt.subplot(0], X[:,1], alpha=0.4)
plt.scatter(X[:,"Original")
plt.title(
1,3,2)
plt.subplot(0], X_scaled[:,1], alpha=0.4)
plt.scatter(X_scaled[:,"Scaled")
plt.title(
1,3,3)
plt.subplot(0], X_white[:,1], alpha=0.4)
plt.scatter(X_white[:,"Whitened")
plt.title(
plt.tight_layout() plt.show()
- Original: elongated ellipse.
- Scaled: axis-aligned ellipse.
- Whitened: circular cloud (uncorrelated, unit variance).
Try It Yourself
- Add a third feature and apply centering, scaling, whitening.
- Compare whitening with PCA - they use the same eigen-decomposition.
- Test what happens if you skip centering before whitening.
The Takeaway
- Centering → mean zero.
- Scaling → unit variance.
- Whitening → features uncorrelated, variance = 1. Linear algebra provides the exact matrix operations to make preprocessing systematic and reliable.
95. Linear Regression and Classification (From Model to Matrix)
Linear regression and classification problems can be written neatly in matrix form. This unifies data, models, and solutions under the framework of least squares and linear decision boundaries.
Linear Regression Model
For data \((x_i, y_i)\):
\[ y \approx X \beta \]
- \(X\): design matrix (rows = samples, columns = features).
- \(\beta\): coefficients to solve for.
- Solution (least squares):
\[ \hat{\beta} = (X^T X)^{-1} X^T y \]
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
Step-by-Step Code Walkthrough
- Linear regression example
0)
np.random.seed(= np.linspace(0, 10, 30).reshape(-1,1)
X = 3*X.squeeze() + 5 + np.random.randn(30)*2 y
Construct design matrix with bias term:
= np.column_stack([np.ones_like(X), X])
X_design *_ = np.linalg.lstsq(X_design, y, rcond=None)
beta_hat, print("Fitted coefficients:", beta_hat)
Fitted coefficients: [6.65833151 2.84547628]
- Visualize regression line
= X_design @ beta_hat
y_pred
="Data")
plt.scatter(X, y, label'r-', label="Fitted line")
plt.plot(X, y_pred,
plt.legend() plt.show()
- Logistic classification with linear decision boundary
= make_classification(n_features=2, n_redundant=0, n_informative=2,
Xc, yc =1, n_samples=100, random_state=0)
n_clusters_per_class
0], Xc[:,1], c=yc, cmap="bwr", alpha=0.7)
plt.scatter(Xc[:,"Classification data")
plt.title( plt.show()
- Logistic regression via gradient descent
def sigmoid(z):
return 1/(1+np.exp(-z))
= np.column_stack([np.ones(len(Xc)), Xc])
X_design = yc
y
= np.zeros(X_design.shape[1])
w = 0.1
lr
for _ in range(2000):
= sigmoid(X_design @ w)
preds = X_design.T @ (preds - y) / len(y)
grad -= lr * grad
w
print("Learned weights:", w)
Learned weights: [-2.10451116 0.70752542 4.13295129]
- Plot decision boundary
= np.meshgrid(np.linspace(Xc[:,0].min()-1, Xc[:,0].max()+1, 200),
xx, yy 1].min()-1, Xc[:,1].max()+1, 200))
np.linspace(Xc[:,
= np.c_[np.ones(xx.size), xx.ravel(), yy.ravel()]
grid = sigmoid(grid @ w).reshape(xx.shape)
probs
=[0,0.5,1], alpha=0.3, cmap="bwr")
plt.contourf(xx, yy, probs, levels0], Xc[:,1], c=yc, cmap="bwr", edgecolor="k")
plt.scatter(Xc[:,"Linear decision boundary")
plt.title( plt.show()
Try It Yourself
- Add polynomial features to regression and refit. Does the line bend into a curve?
- Change learning rate in logistic regression - what happens?
- Generate data that is not linearly separable. Can a linear model still classify well?
The Takeaway
- Regression and classification fit naturally into linear algebra with matrix formulations.
- Least squares solves regression directly; logistic regression requires optimization.
- Linear models are simple, interpretable, and still form the foundation of modern machine learning.
96. PCA in Practice (Dimensionality Reduction Workflow)
Principal Component Analysis (PCA) is widely used to reduce dimensions, compress data, and visualize high-dimensional datasets. Here, we’ll walk through a full PCA workflow: centering, computing components, projecting, and visualizing.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
Step-by-Step Code Walkthrough
- Load dataset (digits)
= load_digits()
digits = digits.data # shape (1797, 64)
X = digits.target
y print("Data shape:", X.shape)
Data shape: (1797, 64)
Each sample is an 8×8 grayscale image flattened into 64 features.
- Center the data
= X - X.mean(axis=0) X_centered
- Compute PCA via SVD
= np.linalg.svd(X_centered, full_matrices=False)
U, S, Vt = (S**2) / (len(X) - 1)
explained_variance = explained_variance / explained_variance.sum() explained_ratio
- Plot explained variance ratio
30]), 'o-')
plt.plot(np.cumsum(explained_ratio[:"Number of components")
plt.xlabel("Cumulative explained variance")
plt.ylabel("PCA explained variance")
plt.title(True)
plt.grid( plt.show()
This shows how many components are needed to capture most variance.
- Project onto top 2 components for visualization
= X_centered @ Vt[:2].T
X_pca2 0], X_pca2[:,1], c=y, cmap="tab10", alpha=0.6, s=15)
plt.scatter(X_pca2[:,
plt.colorbar()"Digits dataset (PCA 2D projection)")
plt.title( plt.show()
- Reconstruct images from reduced dimensions
= 20
k = X_centered @ Vt[:k].T
X_pca20 = X_pca20 @ Vt[:k]
X_reconstructed
= plt.subplots(2, 10, figsize=(10,2))
fig, axes for i in range(10):
0,i].imshow(X[i].reshape(8,8), cmap="gray")
axes[0,i].axis("off")
axes[1,i].imshow(X_reconstructed[i].reshape(8,8), cmap="gray")
axes[1,i].axis("off")
axes["Original (top) vs PCA reconstruction (bottom, 20 comps)")
plt.suptitle( plt.show()
Even with only 20/64 components, the digits remain recognizable.
Try It Yourself
- Change \(k\) to 5, 10, 30 - how do reconstructions change?
- Use top 2 PCA components to classify digits with k-NN. How does accuracy compare to full 64 features?
- Try PCA on your own dataset (images, tabular data).
The Takeaway
- PCA reduces dimensions while keeping maximum variance.
- In practice: center → decompose → select top components → project/reconstruct.
- PCA enables visualization, compression, and denoising in real-world workflows.
97. Recommender Systems and Low-Rank Models (Fill the Missing Entries)
Recommender systems often deal with incomplete matrices - rows are users, columns are items, entries are ratings. Most entries are missing, but the matrix is usually close to low-rank (because user preferences depend on only a few hidden factors). SVD and low-rank approximations are powerful tools to fill in these missing values.
Set Up Your Lab
import numpy as np
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Simulate a user–item rating matrix
0)
np.random.seed(= np.random.randn(10, 3) # 10 users, 3 latent features
true_users = np.random.randn(3, 8) # 8 items
true_items = true_users @ true_items # true low-rank ratings R_full
- Hide some ratings (simulate missing data)
= np.random.rand(*R_full.shape) > 0.3 # keep 70% of entries
mask = np.where(mask, R_full, np.nan)
R_obs
print("Observed ratings:\n", R_obs)
Observed ratings:
[[-1.10781465 nan -3.56526968 nan -2.1729387 1.43510077
1.46641178 0.79023284]
[ 0.84819453 nan nan nan nan nan
2.30434358 3.03008138]
[ nan 0.32479187 -0.51818422 nan 0.02013802 nan
1.29874918 1.33053637]
[-1.81407786 1.24241182 nan -1.32723907 nan nan
-0.31110699 nan]
[-0.48527696 nan -1.51957106 nan -0.86984941 0.52807989
nan 0.33771451]
[-0.26997359 -0.48498966 nan -2.73891459 -2.48167957 2.88740609
-0.24614835 nan]
[ 3.57769701 -1.608339 4.73789234 1.13583164 3.63451505 -2.60495928
2.12453635 3.76472563]
[ 0.69623809 -0.59117353 -0.28890188 -2.36431192 nan 1.50136796
0.74268078 nan]
[ 0.85768141 1.33357168 nan nan 1.65089037 -2.46456289
3.51030491 3.31220347]
[-2.463496 0.60826298 -3.81241599 -2.11839267 -3.86597359 3.52934055
-1.76203083 -2.63130953]]
- Simple mean imputation (baseline)
= np.where(np.isnan(R_obs), np.nanmean(R_obs), R_obs) R_mean
- Apply SVD for low-rank approximation
# Replace NaNs with zeros for SVD step
= np.nan_to_num(R_obs, nan=0.0)
R_filled
= np.linalg.svd(R_filled, full_matrices=False)
U, S, Vt
= 3 # latent dimension
k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :] R_approx
- Compare filled matrix with ground truth
= np.nanmean((R_full - R_approx)**2)
error print("Approximation error (MSE):", error)
Approximation error (MSE): 1.4862378490976202
- Visualize original vs reconstructed
= plt.subplots(1, 2, figsize=(8,4))
fig, axes 0].imshow(R_full, cmap="viridis")
axes[0].set_title("True ratings")
axes[1].imshow(R_approx, cmap="viridis")
axes[1].set_title("Low-rank approximation")
axes[ plt.show()
Try It Yourself
- Vary \(k\) (2, 3, 5). Does error go down?
- Mask more entries (50%, 80%) - how does SVD reconstruction perform?
- Use iterative imputation: alternate filling missing entries with low-rank approximations.
The Takeaway
- Recommender systems rely on low-rank structure of user–item matrices.
- SVD provides a natural way to approximate and fill missing ratings.
- This low-rank modeling idea underpins modern collaborative filtering systems like Netflix and Spotify recommenders.
98. PageRank and Random Walks (Ranking with Eigenvectors)
The PageRank algorithm, made famous by Google, uses linear algebra and random walks on graphs to rank nodes (webpages, people, items). The idea: importance flows through links - being linked by important nodes makes you important.
The PageRank Idea
- Start a random walk on a graph: at each step, move to a random neighbor.
- Add a “teleportation” step with probability \(1 - \alpha\) to avoid dead ends.
- The steady-state distribution of this walk is the PageRank vector, found as the principal eigenvector of the transition matrix.
Set Up Your Lab
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
Step-by-Step Code Walkthrough
- Build a small directed graph
= nx.DiGraph()
G
G.add_edges_from([0,1), (1,2), (2,0), # cycle among 0–1–2
(2,3), (3,2), # back-and-forth 2–3
(1,3), (3,4), (4,1) # small loop with 1–3–4
(
])=True, node_color="lightblue", node_size=800, arrowsize=15)
nx.draw_circular(G, with_labels plt.show()
- Build adjacency and transition matrix
= G.number_of_nodes()
n = nx.to_numpy_array(G, nodelist=range(n))
A = A / A.sum(axis=1, keepdims=True) # row-stochastic transition matrix P
- Add teleportation (Google matrix)
= 0.85 # damping factor
alpha = alpha * P + (1 - alpha) * np.ones((n,n)) / n G_matrix
- Power iteration to compute PageRank
= np.ones(n) / n # start uniform
r for _ in range(100):
= r @ G_matrix
r /= r.sum()
r print("PageRank vector:", r)
PageRank vector: [0.13219034 0.25472358 0.24044787 0.24044787 0.13219034]
- Compare with NetworkX built-in
= nx.pagerank(G, alpha=alpha)
pr print("NetworkX PageRank:", pr)
NetworkX PageRank: {0: 0.13219008157546333, 1: 0.2547244023837789, 2: 0.24044771723264727, 3: 0.24044771723264727, 4: 0.13219008157546333}
- Visualize node importance
= [5000 * r_i for r_i in r]
sizes =True, node_size=sizes, node_color="lightblue", arrowsize=15)
nx.draw_circular(G, with_labels"PageRank visualization (node size ~ importance)")
plt.title( plt.show()
Try It Yourself
- Change \(\alpha\) (e.g., 0.6 vs 0.95). Does ranking change?
- Add a “dangling node” with no outlinks. How does teleportation handle it?
- Try PageRank on a larger graph (like a random graph with 50 nodes).
The Takeaway
- PageRank is a random-walk steady state problem.
- It reduces to finding the dominant eigenvector of the Google matrix.
- This method generalizes beyond webpages - to influence ranking, recommendation, and network analysis.
99. Numerical Linear Algebra Essentials (Floating Point, BLAS/LAPACK)
When working with linear algebra on computers, numbers are not exact. They live in floating-point arithmetic, and computations rely on highly optimized libraries like BLAS and LAPACK. Understanding these essentials is crucial to doing linear algebra at scale.
Floating Point Basics
Numbers are stored in base-2 scientific notation:
\[ x = \pm (1.b_1b_2b_3\ldots) \times 2^e \]
Limited precision means rounding errors.
Two key constants:
- Machine epsilon ($\(): smallest difference detectable (\)^{-16}$ for double).
- Overflow/underflow: too large or too small to represent.
Set Up Your Lab
import numpy as np
Step-by-Step Code Walkthrough
- Machine epsilon
= np.finfo(float).eps
eps print("Machine epsilon:", eps)
Machine epsilon: 2.220446049250313e-16
- Round-off error demo
= 1e16
a = 1.0
b print("a + b - a:", (a + b) - a) # may lose b due to precision limits
a + b - a: 0.0
- Stability of matrix inversion
= np.array([[1, 1.0001], [1.0001, 1]])
A = np.array([2, 2.0001])
b
= np.linalg.solve(A, b)
x_direct = np.linalg.inv(A) @ b
x_via_inv
print("Solve:", x_direct)
print("Inverse method:", x_via_inv)
Solve: [1.499975 0.499975]
Inverse method: [1.499975 0.499975]
Notice: using np.linalg.inv
can be less stable - better to solve directly.
- Conditioning of a matrix
= np.linalg.cond(A)
cond print("Condition number:", cond)
Condition number: 20001.00000000417
- Large condition number → small input changes cause big output changes.
- BLAS/LAPACK under the hood
= np.random.randn(500, 500)
A = np.random.randn(500, 500)
B
# Matrix multiplication (calls optimized BLAS under the hood)
= A @ B C
This @
operator is not a naive loop - it calls a highly optimized C/Fortran routine.
Try It Yourself
- Compare solving
Ax = b
withnp.linalg.solve
vsnp.linalg.inv(A) @ b
for larger, ill-conditioned systems. - Use
np.linalg.svd
on a nearly singular matrix. How stable are the singular values? - Check performance: time
A @ B
for sizes 100, 500, 1000.
The Takeaway
- Numerical linear algebra = math + floating-point reality.
- Always prefer stable algorithms (
solve
,qr
,svd
) over naive inversion. - Libraries like BLAS/LAPACK make large computations fast, but understanding precision and conditioning prevents nasty surprises.
100. Capstone Problem Sets and Next Steps (A Roadmap to Mastery)
This final section ties everything together. Instead of introducing a new topic, it provides capstone labs that combine multiple ideas from the book. Working through them will give you confidence that you can apply linear algebra to real problems.
Problem Set 1 - Image Compression with SVD
Take an image, treat it as a matrix, and approximate it with low-rank SVD.
import numpy as np
import matplotlib.pyplot as plt
from skimage import data, color
# Load grayscale image
= color.rgb2gray(data.astronaut())
img = np.linalg.svd(img, full_matrices=False)
U, S, Vt
# Approximate with rank-k
= 50
k = U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
img_approx
1,2,1)
plt.subplot(="gray")
plt.imshow(img, cmap"Original")
plt.title("off")
plt.axis(
1,2,2)
plt.subplot(="gray")
plt.imshow(img_approx, cmapf"Rank-{k} Approximation")
plt.title("off")
plt.axis(
plt.show()
Try different \(k\) values (5, 20, 100). How does quality vs. compression trade off?
Problem Set 2 - Predictive Modeling with PCA + Regression
Combine PCA for dimensionality reduction with linear regression for prediction.
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
# Load dataset
= load_diabetes(return_X_y=True)
X, y = train_test_split(X, y, random_state=0)
X_train, X_test, y_train, y_test
# PCA reduce features
= PCA(n_components=5)
pca = pca.fit_transform(X_train)
X_train_pca = pca.transform(X_test)
X_test_pca
# Regression on reduced space
= LinearRegression().fit(X_train_pca, y_train)
model print("R^2 on test set:", model.score(X_test_pca, y_test))
R^2 on test set: 0.3691398497153573
Does reducing dimensions improve or hurt accuracy?
Problem Set 3 - Graph Analysis with PageRank
Apply PageRank to a custom-built network.
import networkx as nx
= nx.barabasi_albert_graph(20, 2) # 20 nodes, scale-free graph
G = nx.pagerank(G, alpha=0.85)
pr
=True, node_size=[5000*pr[n] for n in G], node_color="lightblue")
nx.draw(G, with_labels"PageRank on a scale-free graph")
plt.title( plt.show()
Which nodes dominate? How does structure affect ranking?
Problem Set 4 - Solving Differential Equations with Eigen Decomposition
Use eigenvalues/eigenvectors to solve a linear dynamical system.
= np.array([[0,1],[-2,-3]])
A = np.linalg.eig(A)
eigvals, eigvecs
print("Eigenvalues:", eigvals)
print("Eigenvectors:\n", eigvecs)
Eigenvalues: [-1. -2.]
Eigenvectors:
[[ 0.70710678 -0.4472136 ]
[-0.70710678 0.89442719]]
Predict long-term behavior: will the system decay, oscillate, or grow?
Problem Set 5 - Least Squares for Overdetermined Systems
0)
np.random.seed(= np.random.randn(100, 3)
X = np.array([2, -1, 0.5])
beta_true = X @ beta_true + np.random.randn(100)*0.1
y
*_ = np.linalg.lstsq(X, y, rcond=None)
beta_hat, print("Estimated coefficients:", beta_hat)
Estimated coefficients: [ 1.99371939 -1.00708947 0.50661857]
Compare estimated vs. true coefficients. How close are they?
Try It Yourself
- Combine SVD and recommender systems - build a movie recommender with synthetic data.
- Implement Gram–Schmidt by hand and test it against
np.linalg.qr
. - Write a mini “linear algebra toolkit” with your favorite helper functions.
The Takeaway
- You’ve practiced vectors, matrices, systems, eigenvalues, SVD, PCA, PageRank, and more.
- Real problems often combine multiple concepts - the labs show how everything fits together.
- Next steps: dive deeper into numerical linear algebra, explore machine learning applications, or study advanced matrix factorizations (Jordan form, tensor decompositions).
This concludes the hands-on journey. By now, you don’t just know the theory - you can use linear algebra as a working tool in Python for data, science, and engineering.